Jeff Bier’s Impulse Response—Multi-Core Math =

When does 1 GHz + 1 GHz + 1 GHz + 1 GHz not necessarily equal 4 GHz? When you’re calculating the performance potential of a multi-core chip.

Freescale recently introduced a new DSP chip, the MSC8144, that contains four 1 GHz SC3400 processor cores. Freescale characterizes the new chip as being “performance-equivalent” to one 4 GHz core. But is it really? As usual, the answer is, “It depends.” It depends on what kind of application you’re running, how you map the application onto the cores, and how you choose to characterize “performance.”

If you’re running small programs that have minimal interaction with one another—as is common in VoIP infrastructure applications—then yes, it’s possible that four 1 GHz cores will be “performance equivalent” to a single 4 GHz core.

But what if your program is a big one that needs to be partitioned across cores and requires significant inter-core communication, such as high-resolution video compression? Or what if your chip is running a dynamic mixture of software—for example, a mixture of voice and video channels that changes based on demand?

In cases like these you probably won’t get full utilization of the cores. And figuring out what you will get is tricky; it can be surprisingly difficult to answer questions like “Which chip is faster?” Or “How much processing headroom will I have?”

Concepts like “busy” and “idle,” which have clear meanings when used to describe the state of a single-core device, become murky when you’re working with multi-core chips. What if three cores are busy and one is idle? Do you count all of those idle cycles as “headroom” that’s available for adding more features into your product, as you typically would in a single-core chip? Maybe yes, maybe no.

If the idle core is competing with the busy cores for resources (such as L2 cache), then those “idle” cycles may be hard to use. And if the core is idle for only a few cycles at a time, those cycles may be useless. This is also true for single-core chips, but in multi-core chips it’s harder to figure out the minimum number of consecutive cycles that would allow you to do useful work—it depends on both the application and on the activity levels of the other cores.

It would be a mistake to simply add up all the free cycles—regardless of their temporal proximity to one another, and regardless of the states of the other cores—and call that headroom.

Assessing and comparing multi-core chip performance is a complex process, and it requires much more than simple addition.

Jennifer Eyre of BDTI contributed to this column.

Add new comment