PXA250-based PDA Performance Problems Probed

Submitted by BDTI on Sun, 12/15/2002 - 21:00

Intel’s PXA2xx processor family is making significant headway in the high-end PDA market. Last month, for example, Sony began shipping the first Palm OS-based PDA powered by a PXA2xx. Although the PXA2xx has gained wide acceptance in high-end PDAs, some reviewers (for example, at CNET.com) have complained that PXA2xx-based PDAs are not appreciably faster than PDAs based on its predecessor, StrongARM. Some reviewers have been particularly critical of the lack of improvement on multimedia applications.

BDTI recently analyzed the PXA2xx’s signal-processing performance using the industry-standard BDTI Benchmarks™ . Although BDTI has not yet benchmarked StrongARM, it has benchmarked the closely related ARM9 core as well as the ARM7 and ARM9E cores. (Benchmark results for the PXA2xx, ARM7, ARM9 and ARM9E are available at http://www.BDTI.com/Resources/BenchmarkResults/BDTImark2000) These analyses show that the PXA2xx itself is probably not to blame for any lackluster multimedia performance of PXA2xx-based PDAs. Instead, BDTI’s analyses suggest that a 400 MHz PXA250 is roughly three times faster on signal-processing tasks than the fastest StrongARM processor, the 206 MHz SA1110.

Clock speed is the most obvious contributor to this large gain in DSP performance. The PXA2xx achieves its higher clock rate partly by extending the five-stage StrongARM pipeline to seven stages. Although longer pipelines allow higher clock rates, they can also lead to an increase in branch-related stalls. However, the PXA2xx contains a branch prediction unit that mitigates this problem.

The PXA2xx also holds an advantage in architectural efficiency. The PXA2xx instruction set includes a number of DSP-oriented operations not supported by StrongARM. For example, the PXA2xx supports instructions that perform two 16-bit fixed-point multiply accumulate (MAC) operations per cycle. In contrast, StrongARM supports no more than one 16-bit MAC per cycle. The PXA2xx is also able to complete certain load/store operations in the background while it continues to execute other instructions. In contrast, StrongARM is obliged to wait for data transfers to complete before continuing execution. These and other architectural efficiencies give the PXA2xx a significant advantage: BDTI’s analyses suggest that even a 200 MHz PXA210 is significantly faster than a
206 MHz SA1110 on DSP tasks.

So why aren’t users consistently experiencing better multimedia application performance with PXA2xx-based PDAs? The answer likely involves a number of factors including memory system bottlenecks and operating system inefficiencies. One key problem may be a shortage of PXA2xx-optimized multimedia applications. Software developed for the SA1110 won’t take advantage of some key performance-boosting features on the PXA2xx; therefore, PXA2xx-based PDAs will only realize their full performance potential when multimedia software has been optimized for the PXA2xx.
 

Add new comment

Log in to post comments