This month MIPS announced its first multi-threaded licensable core, the MIPS32 34K. The multi-threading capabilities of the 34K are highly unusual—not only in comparison to the other MIPS cores, but also in comparison to most other embedded processors.
Multithreading is a technique for running multiple pieces of code, called “threads,” in parallel with one another. Multithreading is similar to multitasking, a technique for running multiple “processes” in parallel with one another. A key difference between the techniques is that threads share a memory space, while processes each have their own memory space.
Multithreading can be accomplished by running different threads on different processor cores. Alternatively, multithreading can be accomplished by having a single processor switch between threads. The 34K takes this latter approach, providing hardware support to speed switching between threads.
MIPS’ main rationale for pursuing multithreading is that memory systems have become a bottleneck. In many applications, the processor spends a significant percentage of its cycles waiting for the memory system. By switching between threads, the processor can use some of these otherwise wasted cycles to perform useful work.
For example, MIPS tested the performance of the 34K by running two benchmarks on the core simultaneously. According to MIPS, a 360 MHz 34K ran these benchmarks about 50% faster than a single-threaded 400 MHz 24KE running the same benchmarks back-to-back. Interestingly, MIPS reports that running the two benchmarks simultaneously increased the cache miss rate by only about 15%.
The improved speed comes at a modest cost in die area. The 34K configuration used for the benchmark exercise occupies 4.2 mm², which is roughly 30% larger than the 24KE. (All speed and area figures assume fabrication in TSMC CL013G. The area figures do not include area for level-one caches.)
BDTI has not yet evaluated the 34K in detail, so it is difficult to assess how well the 34K will perform in real-world applications. However, it is likely that 34K’s multi-threading approach will be useful for many applications. Many embedded applications involve running multiple tasks, and memory system bottlenecks are often an important factor limiting performance.
The key to the 34K’s efficiency is specialized task-switching hardware. Specifically, 34K licensees can configure the core with up to five “thread contexts” (TCs). Each contains a register file, a program counter, and other resources needed to maintain the machine state for a thread. The TCs enable the processor to switch to a different thread on every clock cycle.
Few other embedded processors provide any hardware support for switching between threads. Without this hardware, a task switch requires the processor to save and restore registers, the program counter, and other state information—a time-consuming process. Thus, most embedded processors cannot easily use task-switching to work around memory system bottlenecks.
In addition to the TCs, the 34K supports up to two “virtual processing elements” (VPEs). Each VPE provides OS-oriented resources such as translation look-aside buffers (TLBs), making it possible to run two OSs simultaneously on the same core. This capability is useful mainly for applications that require both a full-featured OS and an RTOS. For example, one VPE might be used to run a user interface on top of Linux, while the other VPE is used to run audio and video codecs on top of Nucleus. BDTI is not aware of any other embedded processors that offer this capability.
The 34K also supports optional “quality of service” (QoS) logic. The QoS logic allows the programmer to set the percentage of processor cycles devoted to each TC. The programmer can use the QoS logic to ensure that the real-time tasks get enough processing time. The QoS logic is also useful for applications running two OSs. For example, a programmer could configure the processor to devote ¾ of its cycles to an RTOS, and ¼ of its cycles to Linux. The QoS capabilities should be of value for applications that include both signal processing tasks (which usually have hard real-time constraints) and general-purpose tasks (which do not). BDTI is not aware of any other processors that offer similar capabilities.
The 34K appears to offer a number of important and unusual strengths. Perhaps most importantly, the 34K offers unique features that will ease the integration of signal-processing and non-signal-processing tasks on a single core. However, it is clear that the 34K isn’t the right solution for every application. For example, programmers can only take advantage of the 34K’s multithreading capability if they write their code with multithreading in mind. This will add complexity to the software development process. And in some cases, the 34K won’t offer performance advantages over simpler processors.
The MIPS32 34K is available for license now. MIPS has not disclosed the licensing fees or royalties for the core.
Add new comment