Licensing upstart Telairity introduced its high-performance DSP core, the TVP400, at last month’s HotChips conference. On the surface, the TVP400 resembles competing high-performance DSP cores. For example, the TVP400 is projected to achieve a worst-case clock speed of 420 MHz in a 0.13-micron process, which is comparable to the 300–400 MHz clock speeds achieved by competing DSP cores. The TVP400 also delivers a level of parallelism similar to that of its competitors. For example, the TVP400 is capable of completing four 16-bit multiply-accumulate (MAC) operations per cycle. Most competing high-performance DSP cores, such as the ZSP600, are also capable of performing four 16-bit MACs per cycle.
However, closer inspection reveals that the TVP400 uses an unusual form of parallelism. Although the TVP400 contains four independent “vector units,” in each cycle it generally issues only one instruction to one of the four vector units. To keep the vector units busy, the TVP400 uses vector instructions that specify repetitive sequences of operations. For example, a single vector instruction might direct a vector unit to add eight successive pairs of operands from eight pairs of registers; such instructions operate on one pair of operands per cycle. These vector instructions allow the TVP400 to initiate operations in one vector unit and then launch other operations in other vector units while the first vector unit continues to operate.
Each vector unit also contains a vector load/store unit. Like the arithmetic units, these vector load/store units can be configured to perform repetitive sequences of operations. Each vector load/store unit can perform two 16-bit loads and one 16-bit store per cycle, for a total of twelve independent data transfers per cycle. The ability to access twelve memory locations per cycle is highly unusual. Competing DSP cores can access only two independent memory locations per cycle. As a result, competing DSP cores typically achieve high levels of parallelism only when data can be loaded in chunks of contiguous words. In contrast, the TVP400 can operate efficiently on scattered data.
Vector processors have a long history in the world of supercomputers, so it is not surprising that Howard Sachs, Telairity’s president, once worked at Cray Laboratories. But vector processors are rare in the world of embedded computing. Telairity may face reluctance among prospective users due to the unfamiliar approach used in the TVP400. While signal processing algorithm designers typically work with vectors and matrices as their most natural data types, DSP software developers generally are not familiar with the concepts of vector processors.
Add new comment