The BDTI Video Encoder and Decoder Benchmarks™ are proprietary video compression/decompression algorithms loosely based on the H.264 standard. They are representative of the video processing workloads found in applications such as set-top boxes, multimedia-enabled cell phones, personal media players, surveillance cameras, and video conferencing systems.
These benchmarks are designed to model the computationally demanding aspects of video encoding and decoding while limiting complexity in order to reduce implementation and optimization effort. It should be noted that limiting complexity of the benchmark algorithm does not necessarily reduce computational requirements. On the contrary, while the BDTI Video Encoder and Decoder Benchmarks are conceptually modeled after the H.264 standard, the computational requirements for the benchmarks will likely be higher than those required for H.264 Baseline Profile. Because video codecs deployed in embedded applications are typically carefully optimized, implementations of the BDTI Video Encoder and Decoder Benchmarks are also carefully optimized for each target processor. Typically this involves hand-crafting of code (often using intrinsics or assembly language) for critical portions of the benchmarks.
To produce results that are relevant to real-world applications, BDTI has specified two “operating points” for measuring performance on the BDTI Video Encoder and Decoder Benchmarks:
- QVGA Operating Point. At this operating point the benchmarks process a video sequence at QVGA resolution (320x240) with a frame rate of 30 fps. This is appropriate for mobile applications such as cell phones that have small displays.
- D1 Operating Point. At this operating point the benchmarks process a video sequence at standard-definition television resolution (720x480, also known as “D1” resolution) with a frame rate of 30 fps. This is appropriate for applications such as personal media players (PMPs), digital surveillance equipment, and set-top boxes.
The BDTI Video Encoder and Decoder Benchmarks assess the performance of the processing engine and the effects of memory and caches, DMA, coprocessors, and other on- and off-chip components. Performance is reported as the clock speed required by the worst-case four-frame moving average over BDTI's proprietary video clip. The results presented below are certified only for the specified processor architecture configurations, the specified external memory system, and the specified processor clock rates.
Learn more about how the BDTI Video Encoder/Decoder Benchmarks may be used to assess performance.
BDTI Certified Video Encoder/Decoder Benchmark Results
BDTI has released scores for the following processors on the BDTI Video Encoder and Decoder Benchmarks:
- ARM ARM1176JZ-S
- ARM Cortex-A8
- NXP PNX4103
The ARM1176JZ-S is a general-purpose 32-bit RISC licensable core developed by ARM Limited, which supports SIMD operations and includes DSP-oriented instructions supporting multimedia applications. Since it is a licensable core, the ARM1176 performance is characterized in accordance with the BDTI core conditions which mandate a 130 nm process technology.
The Cortex-A8 is a 32-bit licensable core developed by ARM Limited. It implements the ARMv7 instruction set. One major difference between the Cortex-A8 and previous ARM cores is the addition of the NEON instruction set extensions designed to accelerate multimedia tasks. Using these extensions, the Cortex-A8 can execute up to four 16-bit multiply-accumulate instructions per cycle (versus two for the ARM11). Please see our ARM Cortex-A8 page for details on the Cortex-A8 as well as the latest BDTI benchmark results for that core.
The PNX4103 is a system-on-chip including a TMS3270 VLIW DSP, an ARM926 RISC processor, DMA engines, multimedia accelerators, on-chip memory, etc. (Note that neither the ARM RISC processor nor the multimedia acceleration units were utilized to obtain the results shown below). It is manufactured in a 90 nm process technology and includes 16 MB of main memory on chip.
Figure 1: BDTI Video Encoder/Decoder Benchmark results, QVGA Decode (320 x 240) Operating Point. Please refer to the ARM Cortex-A8 detailed results page for Cortex-A8 results.
Figure 2: BDTI Video Encoder/Decoder Benchmark results, D1 (720 x 480) Decode Operating Point. Please refer to the ARM Cortex-A8 detailed results page for Cortex-A8 results.
Figure 3: BDTI Video Encoder/Decoder Benchmark results, QVGA (320 x 240) Encode Operating Point. Please refer to the ARM Cortex-A8 detailed results page for Cortex-A8 results.
QVGA Decode | D1 Decode | QVGA Encode | ||||
---|---|---|---|---|---|---|
% Utilization | Cycles / sec (millions) | % Utilization | Cycles / sec (millions) | % utilization | Cycles / sec (millions) | |
ARM ARM1176 | 78 | 249 | N/A | N/A | N/A | N/A |
NXP PNX4103 | 19 | 67 | 83 | 290 | 51 | 177 |
Table 1: Performance on BDTI Video Encoder/Decoder Benchmarks for specified operating points. Scores for licenseable silicon IP (i.e., ARM core) use worst-case clock speeds for the TSMC CL013G process and ARM Artisan SAGE-X library. PNX4103 benchmark results are only for the TM3270 core the implementation does not use the ARM core found on this chip.
Clock Speed (MHz) | L1 Instruction Cache (Kbytes) | L1 Data Cache (Kbytes) | L2 Cache (Kbytes) | On-Chip Main Memory (Mbytes) | External Memory Speed (MHz) | External Memory Bus Width (bits) | |
---|---|---|---|---|---|---|---|
ARM ARM1176 | 320 | 16 | 16 | N/A | N/A | 106 | 32 |
NXP PNX4103 | 350 | 64 | 128 | N/A | 16 | Not required | Not required |
Table 2: Processor architectural details.
No reproduction or reuse of the above information is permitted without the express authorization of BDTI. For reproduction permission or to obtain benchmark results for your processing engine, please contact BDTI.