The October issue of IEEE Spectrum Magazine includes an interesting article titled "Could Supercomputing Turn to Signal Processors (Again)?" which discusses the viability of developing supercomputers using digital signal processors. It covers, among other things, a recent analysis project co-staffed by engineers at Texas Instruments and researchers at the University of Texas, Austin, to compare the floating-point operation-per-watt capabilities of TI's DSPs against those of other and now-more-common supercomputer processors, such as x86 and PowerPC CPUs.
When doing general-purpose matrix-multiplication, TI's C6678 DSP came out on top in the TI/UT Austin study, delivering 7.4 GFLOPS of performance per watt. However, a closer perusal of the results (and assumptions) raises questions. For one thing, the study focused on single-precision (32-bit) arithmetic, which the C6678 can complete in a single clock cycle. Double-precision (64-bit) math, for which the competitive processor architectures studied are optimized and which would take at least four times the clock cycles (therefore four times the energy) on the C6678, would result in less robust competitive results for the TI SoC. Of course, should the market opportunity warrant the investment, TI could always revamp its DSPs to improve their double-precision mathematics capabilities.
The other constraint on the study is that a solitary focus on arithmetic is insufficient for even a specialty computer design; general-purpose processor resources must also exist to comprehend control code and other non-math operations. Texas Instruments' latest KeyStone II SoCs, which were formally introduced several weeks ago after a mid-summer coverage "tease", address this particular requirement via their combination of multiple ARM CPU cores and (optional) C66x DSP cores (Figure 1). First things first: how does KeyStone II differ from the first-generation KeyStone products that InsideDSPmost recently covered last November? Texas Instruments migrated from a 40 nm process to 28 nm lithography, with corresponding transistor-count, performance and power consumption improvements that enabled two key transitions:
- More CPU and/or DSP cores per device, at a given price point, and
- A migration from the ARM Cortex-A8 (or no integrated ARM core at all) to the ARM Cortex-A15 as the CPU foundation
Figure 1. Texas Instruments' new KeyStone II SoCs target high-end networking, specialty server and other digital signal processing-centric applications.
Next question: how do Texas Instruments' latest KeyStone II devices differ from the company's C6636, unveiled in February at Mobile World Congress as the company's premier KeyStone II offering? At first glance, the high-end 66AK2H12 seems to be identical to its C6636 forebear; both contain four Cortex-A15 cores running at 1.4 GHz (max), along with eight C66x DSP cores running at 1.2 GHz (max). However, a close inspection of the two devices' block diagrams reveals differing peripheral mixes connected to the CPU and ARM cores via the common TeraNet interconnect bus, reflective of different application focuses (Figure 2).
Figure 2. The C6636, the premier KeyStone II device unveiled at the beginning of this year, has a cellular base station-optimized peripheral mix.
Whereas the C6636 was created for cellular base station designs of various sizes, the newer SoCs are intended for high-end networking (routers, switches, etc), enterprise and industrial, and special-purpose server applications. As such, this time around you'll find network-tailored function blocks, such as Ethernet packet acceleration engines, along with 1 GbE and 10 GbE MACs, and USB3 and other computer-tailored system interfaces. And here's another KeyStone-versus-KeyStone II difference: whereas with first-generation KeyStone, DSP-only products were included in the product mix, this time you'll find CPU-only devices included in the product family option table (Table 1).
Product |
ARM Cortex-A15 CPU cores (1.4 GHz max) |
C66x DSP cores (1.2 GHz max) |
GMACS |
GFLOPS |
Dhrystone MIPS |
66AK2H12 |
4 |
8 |
352 |
198.4 |
19,600 |
66AK2H06 |
2 |
4 |
176 |
99.2 |
9,800 |
66AK2E05 |
4 |
1 |
89.6 |
67.2 |
19,600 |
66AK2E02 |
1 |
1 |
56 |
33.6 |
4,900 |
AM5K2E04 |
4 |
- |
44.8 |
44.8 |
19,600 |
AM5K2E02 |
2 |
- |
22.4 |
22.4 |
9,800 |
Table 1. New KeyStone II product family members, key specifications and performance estimates (per Texas Instruments).
With respect to the devices' specialty server potential, the DSP cores’ limitation to 32-bit single-cycle operations likely won’t be a major concern for the applications that Texas Instruments is targeting; "cloud"-based multimedia processing, gaming, video analytics, radar, and the like. However, the KeyStone II SoCs' 32-bit CPU cores may be a bigger concern. Currently, conventional servers typically use 64-bit AMD and Intel CPUs, running 64-bit operating systems (Windows, Linux/Unix, and Mac OS X) and applications. The transition from the x86 to the ARM instruction set will be challenging enough for system developers and users alike. A further downgrade from 64-bit to 32-bit processing may be an unacceptable tradeoff, even considering the power consumption savings.
ARM has a 64-bit successor to the Cortex-A15, the Cortex-A57, on its public long-term roadmap. But 64-bit silicon implementations of the Cortex-A57 aren't predicted to begin appearing until 2014 at the earliest. For now, TI will need to find what server-and-other homes it can for its 32-bit CPU-based SoCs. The 66AK2H12 and 66AK2H06 high-end core count family members are now sampling, with volume production slated for Q2 2013. Mid-range and low-end KeyStone II devices will begin sampling in the second half of next year. The device family pricing begins at $49 for the lowest 850 MHz speed bin. In contrast, Intel's just-introduced "Centerton" S1200 server SoCs are Atom processor-based and contain dual (physical...quad virtual via HyperThreading) CPU cores and no general-purpose DSP resources, with volume pricing beginning at $54. However, unlike the KeyStone II devices, they natively run x86-compiled software and are 64-bit capable, as well as supporting ECC memory.
Add new comment