CEVA's TeakLite-4: Audio Once Again Comes to the Fore

Submitted by BDTI on Tue, 04/24/2012 - 12:17

"If it's not broken, don't fix it." That well-known maxim seemed for many years to encapsulate CEVA's approach to audio DSP cores, given that the company's third-generation offering in this particular application space (and first-generation 32-bit core), the TeakLite-III, dates from 2007. However, after both fortifying its foundation communications DSP offerings ("CEVA's XC4000 DSP Core: The Communications Focus Expands Even More") and moving into the emerging embedded vision space ("The CEVA-MM3101: An Imaging-Optimized DSP Core Swings for an Embedded Vision Home Run") earlier this year, the company has now turned its attention back to its other bread-and-butter money-maker, audio, with the newly introduced TeakLite-4 family (Figure 1).

Figure 1. CEVA's TeakLite-4 upgrades the company's audio-focused DSP core product line for the first time in five years.

As with the XC4000, TeakLite-4 exemplifies a transition from a core point product to a suite of next-generation offerings, targeting various price, performance and power consumption points. And within TeakLite-4 you'll find several XC4000-reminiscent features, such as the second-generation PSU (power scaling unit) for dynamic and leakage power savings of up to 30% versus TeakLite-III, suggesting that the company has fully embraced design modularity and function block sharing across product lines when it makes sense. Yet, in many respects, TeakLite-4 takes an audio application-optimized DSP approach (Figure 2).

Figure 2. The TeakLite-4 block diagram encompasses the four so-far announced core proliferations and alludes to additional feature set options to come.

For the moment, at least, TeakLite-4 comes in four flavors (Figure 3). The low-end TL410 comprises approximately 100,000 gates of logic (roughly 25% smaller than the TL3210) and contains a single 32x32 bit multiply-accumulate (MAC) unit, along with two 16x16 bit MACs, and a 64-bit memory bus. Its beefier sibling, the TL411, leverages an additional 40,000 logic gates in implementing a second 32x32 bit MAC, for two total, and two additional 16x16 bit MACs, for four total. And the high-end TL420 (190,000 gates) and TL421 (230,000 gates) match the MAC configurations of their TL41x counterparts, while transitioning to a more robust rest-of-SoC connectivity scheme consisting of both instruction and data cache controllers along with a master/slave AXI (Advanced eXtensible Interface).

Figure 3. TeakLite-4 core proliferations provide varied MAC counts and SoC interface options.

Note that Figure 2 alludes to feature set options not implemented in the initial four TeakLite-4 core variants, such as a quad 32x32 bit MAC cluster and 128-bit memory bus. The product roadmap also includes mention of high-end "Future TL4 Cores," which Moshe Sheier (CEVA Product Manager) and Eran Briman (Vice President of Marketing) suggested would service two specific incremental market segments:

  • Living room (and soon, mobile) gaming devices that mix 32 voice channels and 256 audio channels, and
  • Automotive applications that, for example, employ multiple microphones for active voice cancellation

Note, too, the optional FPU in the bottom right area of the block diagram. Like the XC4000 family, TeakLite-4 supports TCEs (tightly coupled extensions), including custom user-defined instruction set additions. However, this time the CEVA-supplied TCEs aren't communications-centric, and they're also fewer in number...specifically one in number. The single-precision FPU, currently under development with availability slated for "later this year," comprises an estimated incremental "few dozen Kgate" budget, will be IEEE 754-compliant, and is intended to support quick time-to-market ports of C language floating-point code coming from the x86 world, for example.

Speaking of instruction sets, TeakLite-4 offers additional flexibility beyond TCE customization. TeakLite-4 delivers native 32-bit processing, comprehending 32-bit register files with automatic 32-bit saturation and 72-bit MAC accumulation for wide dynamic range. If your design doesn't employ legacy code that requires you to retain 16-bit TeakLite and TeakLite-II backwards-compatibility, you can specify a 32-bit only core implementation with reduced required silicon area (the earlier TL4xx gate count estimates encompassed this particular optimization).

On the other hand, you might want to add instructions. If so, the TL411 and TL421 support optional ISA extensions for bit-stream processing acceleration. CEVA estimates that TeakLite-4 will run up to at least 1.5 GHz in a 28 nm process, and that the various core versions will run common audio functions at competitive clock-cycle counts (Table 1).

DSP Kernels (cycles)

TL410/TL420

TL411/TL421

Future quad 32x32 bit MAC core

256-point FFT (32 bit)

5,380

3,150

<1,500

128-tap block FIR (32 bit)

16,640

8,448

TBA

5x80 DFII biquad IIR (32 bit)

2,000

1,600

TBA

Table 1. CEVA's clock cycle estimates for various audio DSP functions.

The TL410 and TL420 will be available this quarter, with the TL411 and TL421 following them in the third quarter. Five years is forever in the technology sector, and as such it's good to see CEVA upgrade its audio DSP core line, no matter that TeakLite-III had an impressively long run. It'll be interesting to see how quickly the company's customers migrate to TeakLite-4, whether for performance, power consumption and/or cost savings or other factors. Equally interesting will be the degree to which CEVA's cores experience competitive pressure from alternative architectures (see "Tensilica's HiFi 3 DSP Core: Audio Post-Processing Comes to the Fore").

Add new comment

Log in to post comments