ARM (Advanced RISC Machines) Processors
By now, there would be a very small segment of the mobile community that would not have heard anything about the Android OS by Google. This OS seems to be giving many proprietary vendors a run for their money and the owners many a sleepless nights. But did you know as to what does android depend on for its good performance apart from its robust code? What makes it run so smooth and yet amazingly fast without costing you a fortune? What drives the Android? The answer is ARMs, Advanced RISC Machines, previously known as Acorn RISC Machines.
History of ARM Processors
ARM machines have a history of living up to the expectations of their developers, right from the very first ARM machine ever developed. It all began in the 1980s when Acorn Computers Ltd., spurred by the success of their platform BBC Micro wished to move on from simple CMOS processors to something more powerful, something that could stand strong against the IBM machines launched in 1981. The solutions available in the market like the Motorola 68000 were not powerful enough to handle graphics and GUIs leaving only one option with the company, make their own processor.
Inspired by the making of 32 bit processors by some undergraduates at Berkeley and a one man design center Western Design Center, Phoenix, Steve Furber and Sophie Wilson of Acorn Ltd. set out to make their own processors. Sophie developed the instruction set and simulated it on the BBC Basic which convinced many in the company that it was not just anything half hearted shot aimed in darkness. With the support and permission of the then CEO Hermann Hauser, the ARM project formally took off in 1983 with VLSI Technology as their silicon partner, to produce an ARM processor with latencies as low as that of the 6502. The first ARM core dubbed as ARM1 was delivered by VLSI Technology in 1985. This processor used in conjunction with the BBC Micro helped in the development of the next generation called ARM2. 1987 saw the release of ARM Archimedes.
Acorn floated a new company Advanced RISC Machines Ltd. solely dedicated for ARM core development. In 1992, Acorn won the Queen’s Award for Technology for the ARM. Apple and ARM collaborated to develop the ARM6 cores on which the Apple Newton PDAs were based. Later, the technology was also transferred to Intel over a settlement of lawsuit. Intel further modified it and developed its own high performance line XScale, now sold to Marvell. ARM Inc. is involved with developing cores primarily while its licensees make microcontroller and processors, the most popular being the ARM7TDMI machines. Some prominent licensees of ARM machines are Alcatel Lucent, Apple, Atmel, Cirrus Logic, Freescale, DEC, Intel, LG, Marvell, Microsoft, Nvidia, Qualcomm, Samsung, Sharp, ST microelectronics, Symbios Logic, Texas Instruments, VLSI Technology, Yamaha, Zilabs etc.
ARM 490T core structure
ARM machines have a 32 bit Reduced Instruction Set Computer (RISC) Load Store Architecture. (Also read article on CISC & RISC Architecture) The relative simplicity of ARM machines for low power applications like mobile, embedded and microcontroller applications and small microprocessors make them a lucrative choice for the manufacturers to bank on. The direct manipulation of memory isn’t possible in this architecture and is done through the use of registers. The instruction set offers many conditional and other varieties of operations with the primary focus being on reducing the number of cycles per instruction featuring mostly single cycle operations.
All instructions in the ARM ISA are conditional with the normal execution instructions also being accompanied by condition AL. There are 14 conditions available excluding AL. The instruction set added many feathers to its cap as and when the generations grew. The transistor count has also increased substantially from 30000 in ARM2 to about 26 million in Cortex-A9 ARM. An additional Thumb architecture was developed to support 16-bit instruction models on the otherwise 32 bit ARM machines. No matter the added advantage of increased code density which was about 65% of the original ARM code, this resulted in a little performance drop in the ARM machines. This drop was somewhat balanced with Thumb 2 which was a major extension over the Thumb ISA.
ARM Architecture Contd.
In Thumb 2, the compiler automatically selects a mixture of 16 bit and 32 bit instructions. It is to be noted that only the instruction set changes from 32 bit to 16 bit, the core continues to operate at 32 bit. The evolution of ARM v7 cores saw the development of Thumb Execution Environment (Thumb-EE) which offered dynamic coding by compiling the code moments before or during execution itself. Thumb feature is basically another Instruction Set running on the same platform. Another Instruction set, to execute Java codes on ARMs was developed soon and was named Jazelle. These three Instruction sets are now the three states on an ARM core and to shift from one state to another, directives like ARM, THUMBX and THUMB are given to the assembler. The evolution of ARM architectures is shown in the figure below:
The nomenclature of ARMs is based on the type and features used in it. For example in ARM7TDMI, ‘T‘ stands for Thumb, ‘D’ and ‘I’ together comprise the on chip debugging facilities and ‘M’ signifies support for an enhanced multiplier and support for 64-bit results. ARMx7z like the ARM1176JZ-S indicates AXI bus, physically mapped caches and MMU, has version 6Z architecture. In this way, there is a naming convention for ARM devices.
ARM architectures used various stages of pipelining to enhance the flow of instructions to the processors. This allows several operations to be performed simultaneously which would otherwise be performed serially. For example, the ARM7TDMI used 3 stages, ARM9TDMI uses 5 stages and the ARM10TDMI use 6 stages of pipeline to speed up delivery and faster clocking. Cores up to ARM7 followed a Von Neumann type architecture which is essentially memory mapped architecture. ARM9 and its successors shifted to Harvard Architecture which is port mapped. They also provide a robust debugging environment like the Embedded ICE Logic which connects with the external world through a Test Access Port or a standard IEEE 1149.1 JTAG connection. This helps shorten the development cycle.
In general ARMs have 37 registers arranged in partially overlapping banks, with separate register banks for each processor mode thus providing rapid context switching for special operations. The various modes in an ARM can be summarized in the figure below.
Each register is 32 bits in size. The registers are roughly divided into:
30 General Purpose Registers: Only 15 GPRs are visible any one time depending on the mode of operation and are numbered R0-R12, Stack Pointer and Link Register. While the stack pointer is essentially used by the compliers like those of C/C++, its use as any other GPR is deprecated. Link register stores return addresses in subroutines or exceptions depending on the mode of operation.
Program Counter: Loads the address of destinations on branching operations and may be manually set while doing subroutine calls.
Application Program Status Register: It contains a copy of flags from the ALU to check if the conditional instructions were executed.
Current Program Status Register: It holds various information regarding APSR, current processor mode, interrupt flags, execution state bits etc.
Saved Program Status Register: In case an exception is detected, this register holds the values of the CPSR.
Classification of Instruction Set
The ARM and Thumb instruction sets can be broadly classified into the following functional groups.
1. Branching and Control Instructions: Instructions like subroutine calls, looping and changing the state between ARM and Thumb fall under this category of instructions.
2. Register Load and Store instructions: Loading the values of single registers to and from the memory are covered under this type of instructions. The values may be 32 bit word, a 16-bit half word or an 8 bit unsigned value.
3. Multiple Register Load and Store Instructions: Facilitate the to and fro movement between the contents of the multiple registers, used in block operations and stack operations.
4. Data Processing Instructions: Operations like addition, subtraction or bitwise logic on the contents of the registers are performed by this type of instructions.
5. Status Register access Instructions: These instructions primarily move the contents between the status registers and the GPRs.
6. Coprocessor Instructions: These provide a general framework to extend the ARM architectures.
Anyone with a prior knowledge of basic microprocessor architectures will recognize the striking resemblance between the various classifications and also the instruction set. However there are a few features which cannot be customized by the user himself in ARMs and the job is left to the compiler itself. With the evolution of the Cortex machines, the processors have been now divided into 3 profiles based on the type of application they handle:
1. Application profile: These are application specific processors like the Cortex-A8 which feature Memory management support (MMU) and high performance at low power.
2. Real-time profile: Made for real time processors like the Cortex-R4 which has a protected memory (MPU), and low latencies required for real time applications.
3. Microcontroller profile: These devices are meant for mobile devices like the Cortex-M3. Predictable behavior is the main priority with low gate count and finds use in embedded specific applications.
Starting with the old ARMs working at slow cycle speeds, the machines have eventually evolved into high performance machines offering better battery performance and less power consumption. A brief chart of a few architecture families is shown below.
The ARM9 worked on 130-220 MHz clocks typically, which grew to 225-333MHz in ARM10, 412 MHz in ARM11, 600MHz in ARM Cortex A8 and to 1 GHz in the ARM Cortex A9 line of architectures. Each generational leap is marked with drastic performance improvements just like a generational jump in Pentium machines.
Current Scenario & Future
ARM has found wide acceptance among the mobile device manufacturers with more that 98% devices being shipped having at least one ARM core. At least 90% of the embedded 32 bit processors are based on ARM. They find use in a multitude of applications ranging from consumer electronics like PDAs, mobile phones etc. and hand held gaming consoles to networking solutions like routers. ARM based customizable microcontrollers developed by licensees like the AT91CAP9 Atmel find use in DSP devices as in FPGAs. ARM processors offer the best MIPS per watt, MIPS per Dollar and best code density in the industry with the smallest die sizes with contemporary RISC processors.
Applications in robotics like the ARM Rubik’s Speedcuber are gaining popularity. With the rising popularity of smart phones as their market penetration increases, the ARM cores are gaining more popularity with giants like Google, HTC, Nokia, Adobe, Acer, Nvidia, Motorola, LG and many others all set to standardize the ARM processors for the operating system Android. Over 1.15 billion ARM chips have been placed in tablets and smart phones. Dual cores like the LG Optimus 2X have started using the latest in the line Cortex cores to deliver more performance in reduced space requirements. Microsoft has recently confirmed that the next version of windows will have ARM architecture support. Sources claim that Apple is in process of replacing the Atom processors with ARM cores. With each milestone that the ARM processors achieve, they are being pitted against the x86 platforms. With the exceptional growth in the market share and popularity of ARMs over a very short span of time, ARM architecture seems to be a very promising venture for the present and the future. ARMs have certainly provided powerful arms to our advanced digital products and the dream to make them available to public at affordable pricing.