TI Unveils 'C64x+ Details

Submitted by BDTI on Wed, 05/25/2005 - 19:00

Last week TI introduced the TMS320C6455, the first general-purpose DSP to use TI's new ‘C64x+ core. (The previously announced TCI6482 also uses the new core, but this part is available only to select customers. See the February 2005 edition of Inside DSP for details.) TI also revealed the details of the 'C64x+ architecture.

As the name suggests, the C64x+ is based on TI's well-established C64x DSP architecture. The C64x+ is object-code compatible with its predecessor, and in most respects it is similar to its predecessor. For example, both architectures can execute up to eight instructions per cycle. And like current C64x-based chips, the new C6455 will operate at up to1 GHz. However, the C64x+ includes some important upgrades that significantly improve both the throughput and the memory-efficiency of the new architecture.

The most prominent upgrade is increased multiply-accumulate (MAC) throughput. The C64x+ can perform up to eight 16-bit MAC operations per cycle, compared to a maximum of four MAC operations per cycle on the C64x. The C64x+ is also able to complete up to two 32 x 32 MAC operations per cycle. In contrast, the C64x does not directly support 32 x 32 MAC operations. The C64x+ also offers expanded add and subtract capabilities, as well as new bit-manipulation instructions that accelerate security and communications algorithms.

Interestingly, the C64x+ adds no video-specific instructions. This is striking because video applications are a key target for the new architecture. Although the C64x+ will have respectable video-processing capabilities, it could have benefited from additional video-oriented instructions.

Moving beyond new instructions, the C64x+ also takes a new approach to software-pipelined loops—which are used heavily in optimized C64x code to reduce the impact of the deep pipeline. The C64x+ adds a loop buffer that greatly reduces the need for loop setup and cleanup code. The obvious benefit of this change is that it reduces code size in loop-intensive signal-processing code. The loop buffer also allows the programmer to schedule instructions that execute only once in parallel with loop instructions. This feature makes use of execution slots that would otherwise go unused, significantly improving performance in some cases.

Although the loop buffer brings important benefits, it requires a style of programming that many assembly-level programmers will find unfamiliar and challenging. This is particularly problematic because the C64x was already a challenging assembly-code target.

Last but not least, the C64x+ supports 16-bit wide instruction words as well as the 32-bit instructions used by the C64x. The use of mixed-width instruction sets is a common memory-saving feature, but the C64x+ takes an unusual approach to implementing this feature. Due to this unusual approach, the programmer cannot specify which instructions use 16-bit encoding. Instead, the assembler determines where it can use 16-bit encoding. It is difficult to tell where 16-bit instructions will be used, making it difficult for assembly-level programmers to minimize memory use. The upside of TI's approach is that re-assembling C64x code for the C64x+ will usually provide significant memory savings.

BDTI recently completed an analysis of the C64x+ using its BDTI Benchmarks. Based on the results of this analysis, the combination of new instructions and the loop buffer give the C64x+ a 20% performance boost over its predecessor. On some algorithms, the C64x+ also uses roughly half as much program memory as the C64x. (An analysis of both program and data memory use shows that the C64x+ uses about 15% less memory than its predecessor overall). Benchmark results for the C64x and C64x+ are available at http://www.BDTI.com/Services/Benchmarks/DKB.

Overall, the C64x+ is a significant, if not revolutionary, improvement over the C64x. By improving both speed and memory use, TI is sure to strengthen its lead in high-performance DSP. The main challenge for TI will be helping its customers deal with the increased complexity in what was already a highly complicated architecture.

The C6455 is expected to begin sampling in the third quarter of 2005. Volume production is scheduled for the second quarter of 2006. Planned pricing for 10,000-unit orders is $259 for the 1 GHz version, $219 for the 850 MHz version, and $179 for the 720 MHz version. 

Add new comment

Log in to post comments