TI has unveiled a new chip-level architecture for high-performance, multi-core DSP-processor-based SoCs. Most notable among its features are new on-chip and chip-to-chip interconnection mechanisms, an upgraded high-performance DSP core, and both hardware and tools support for programming concurrent applications. The architecture is optimized to run at 1.0 to 1.2 GHz in 40 nm process technology.
According to TI, initial chips based on the new architecture will incorporate four or eight DSP cores delivering peak computation throughput of up to 256 GMACS or 128 GFLOPS. These chips will target communications infrastructure applications, such as wireless base stations. Although TI declined to disclose details of the new DSP core, from the chip-level performance data it is clear that the new fixed-/ floating-point core will achieve up to 32 GMACS or 16 GFLOPS, and operate at 1.0 GHz or 1.2 GHz. This is a significant boost in performance over TI’s current high-end core: based on TI’s data, the new core architecture will offer about 3.5 times the peak 16-bit fixed-point MAC performance of the C64x+ core used in TI’s current high-performance DSPs. Since the clock speed of the new core is the same as that of its predecessor, the higher performance will have to come from added parallelism.
Also noteworthy, this will be the first time that TI’s highest-performance core has provided floating-point capability. Based on TI’s data, the new core will offer about 2.5 times the peak floating-point performance per MHz compared to the current ‘C67x floating-point core. With up to eight cores operating at approximately double the clock frequency of earlier TI floating-point chips, chips based on the new core will offer by far the highest floating-point performance of any TI chip to date. This new emphasis on floating-point reflects the increasing algorithmic complexity found in wireless infrastructure applications (for example, matrix inversion used in MIMO applications, and layer 2 scheduling). In addition, high-performance floating-point capabilities reduce the amount of code that must be manually converted from floating-point to fixed-point—a complex and time-consuming task.
Also noteworthy is TI’s new emphasis on enabling multi-core programming. Previously, TI had provided developers with general guidelines on multi-core programming, but did not offer a robust multi-core programming methodology and associated tool set. Now, TI is promoting and supporting a concurrent programming approach based on a task-oriented dispatch library similar to Apple’s Grand Central Dispatch. This model allows programmers to set up queues of tasks which are dispatched to a processor or accelerator when the resource becomes available. The approach is supported by hardware hooks and a suite of software tools and tool features—such as a task dispatch library, transaction viewer and event correlator—designed to help engineers write and debug multi-core programs. TI’s decision to focus more on providing a multi-core programming development methodology and environment, if executed well, could give TI an ease-of-use edge over other DSP processor vendors, and help it compete against emerging threats from CPUs and GPUs.
TI’s new SoC architecture includes a multilayer bus which supports 256 Gbytes/second on-chip data transfer, and non-blocking transfers. This highlights how much raw data is required by the processing engines, as well as the fact that task-dispatch-based multi-core software may require more data movement than other forms of multi-core programming. We expect that this bus architecture (which we suspect is similar to ARM’s AMBA AXI bus) may be seen across many of TI’s future products, even where the performance requirements are less stringent.
Add new comment