ARM unveiled the ARM11, the first core to implement the ARMv6 instruction set, at the recent Embedded Processor Forum. The ARM11 contains a number of features that should prove particularly useful for DSP applications. The most prominent of these features are the new dual-16-bit multiply-accumulate (MAC) instructions. DSP algorithms typically make heavy use of MAC operations, so these instructions will likely give the ARM11 a major performance boost over its single-MAC predecessors. The ARM11 also features new dual-16-bit add and subtract instructions that will likely prove useful for DSP algorithms like the FFT.
Although the new MAC operations are important, efficient MAC operations by themselves do not guarantee high DSP performance. For example, a comparison of BDTI Benchmark™ results shows that a 200 MHz ARM ARM9E is about 20% slower than a 160 MHz Texas Instruments TMS320C54xx on typical DSP tasks, even though both processors can perform one MAC per cycle. (See http://www.BDTI.com for these results.) The ARM9E is slower than the 'C54xx largely because the ARM9E lacks the zero-overhead loop structures found on the 'C54xx. Instead, the ARM9E implements loops with multi-cycle conditional branches. The ARM11 should fare much better in this respect: it contains branch prediction features that should greatly reduce loop overhead. This branch prediction scheme, which was introduced on the ARM10, reduces the branch latency to zero cycles for the most common cases.
Another important ARM11 feature carried over from the ARM10 is the parallel load/store unit. This feature allow computation to continue while data transfers complete; for example, the ARM11 can initiate a multi-cycle “load multiple registers” instruction and then continue computation in parallel with the load operations. The ARM11 also features a 64-bit data bus like that on the ARM10, rather than the 32-bit bus used on older ARM cores. Together, these improvements will bring the ARM11 data transfer capabilities much closer to those of a typical DSP—and since DSP applications are notoriously data-hungry, these capabilities should greatly improve DSP performance.
ARM cores have long competed with DSPs for applications like portable MP3 players that require only modest DSP performance. By giving the ARM11 serious DSP capabilities, ARM has made it clear that it has designs on a much broader range of DSP applications.
According to ARM, the ARM11 will be available for license in the fourth quarter of 2002.
Add new comment