This article has been modified from its original content. The original article contained a reference to the performance of the CEVA X-1620 core on a compiled C language implementation of the AMR-NB vocoder; this reference has been removed. These results were taken from a 2007 CEVA white paper. CEVA states that the X-1620 core is no longer offered for license and has been superseded by the CEVA X-1622. CEVA also states that the compiled-code performance results for the CEVA X-1622 on the AMR-NB vocoder are significantly better than those reported earlier for the CEVA X-1620. BDTI briefly reviewed CEVA’s new performance results, but in the limited time available BDTI was unable to confirm whether CEVA’s and Tensilica’s results are based on comparable test conditions and metrics.
This month Tensilica announced a new licensable DSP core, the ConnX D2. The D2 is an optional configuration of the Xtensa LX processor, and is a dual-MAC general-purpose DSP engine with communications-oriented enhancements. Unlike Tensilica’s recently announced high-performance baseband engine, the ConnX BBE, the D2 is a basic, entry-level DSP core designed for use in the many applications currently served by medium-performance processors like the CEVA TeakLite families and Texas Instruments ‘C55x. According to Tensilica, the core is currently being previewed with select customers, and will be production-ready in October.
Steve Roddy, Tensilica’s VP of marketing and business development, comments that “Usually when a core is announced you’ll see a chart of speed and size, and the new core is always up and to the right. In this case, we’re backfilling our portfolio with the D2 because we haven’t had an entry-level DSP core.”
Like other Tensilica cores, the D2 can be customized using Tensilica’s TIE language to add custom instructions and I/O interfaces. Figure 1 shows the configurable and optional features of the ConnX D2.
Figure 1. Tensilica ConnX D2 core. (Figure courtesy of Tensilica)
So why another mid-range DSP, and why now? Tensilica reasons that many of the dual-MAC DSP cores currently in use—including a number of homegrown solutions developed by chip companies—have lousy compilers, and that’s a big weakness. As applications have become larger and more complex, with more control code interleaved with DSP functionality, compiler performance has become a key factor influencing the usefulness of a DSP core. A few of the most widely used DSPs (such as Texas Instruments’ ‘C6x families) have solid compilers, which are enabled by more modern architectures. But on the lower end, where VLIW is rare and instruction sets are still often complex and non-orthogonal, the compilers tend to be abysmal.
Enter the D2. The new core combines two-way VLIW with two-way SIMD to enable decent (though modest) performance along with good flexibility for the compiler; if the compiler can’t figure out how to use SIMD (or the code segment lacks data parallelism), it can sometimes use VLIW instead. The processor can execute a single 16-bit or 24-bit instruction, or two 32-bit instructions as part of a 64-bit VLIW instruction. For VLIW instructions, the processor can perform one load plus a SIMD dual-MAC, for example. This is similar to the capability of the ‘C55x, but Tensilica believes the D2’s underlying RISC instruction set enables better compiler performance than the ‘C55x’s compound, complex instructions.
Combining SIMD and VLIW is an approach that’s become fairly common in high-end processors (in fact, Tensilica’s BBE core is a good example) but is not as common in the dual-MAC class of processor, though CEVA’s dual-MAC X-1620 core also uses this approach.
Tensilica has invested significant effort in its “compilability” goal, particularly for ITU reference code. The ITU has defined a number of intrinsics (fundamental building-block functions) that are used in its reference code for speech compression algorithms. Tensilica has mapped each of the ITU fixed-point C intrinsics into a single D2 assembly instruction to increase its performance on ITU code. According to Tensilica, an out-of-the-box compilation of ITU reference code for the AMR-NB audio encoder plus decoder requires 27.7 MHz on the D2, versus 48 MHz on the CEVA-X1620. (BDTI benchmark results for the CEVA X-1620 are available at /Resources/BenchmarkResults.)
Furthermore, in an effort to make it easier for customers to transition over from the popular Texas Instruments’ ‘C6x, Tensilica has implemented TI’s ‘C6x fixed-point C intrinsics on the D2 as well. According to Tensilica, its intrinsics produce results that are bit-for-bit identical to TI’s.
Of course, the reason people keep using dual-MAC processors is that they need a processor that’s small, or cheap, or has low power consumption—or all three. According to Tensilica, when optimized for area the D2 occupies 0.18 mm2(for the core plus memory interfaces and cache controllers) in a 65nm GP process and runs at about 100 MHz. The core can run at 600 MHz in the same process when optimized for speed (the high-speed version is 0.35 mm2). Power consumption is 52 uW/MHz, measured running the AMR-NB algorithm, according to Tensilica.
Add new comment