Jeff Bier’s Impulse Response—Bamboozling with Benchmarks, Part 2

In the February 2006 column, I listed four of the Top Ten ways in which processor benchmark results are commonly misused. This month I’ll cover the remaining six. If you rely on benchmark results, you’ll want to watch out for these.

Comparing projected benchmark results for a chip that doesn’t yet exist to results for a chip that does. Mixing projected and actual benchmark results isn’t necessarily bad (and can be quite informative) but there are a couple of ways in which it can be misleading. First, the projected speed, cost, and energy consumption may not be achieved. And even if they are, competitors’ speeds, energy consumption, etc. may have improved in the interim, making comparisons with older chips irrelevant.
Comparing benchmark results that were optimized for speed with results that were optimized for memory efficiency. If the optimization strategies are different, then the benchmark results aren’t comparable—even if the benchmark itself is the same.
Using benchmark results that aren’t relevant to the target application. This sometimes happens when vendors cherry-pick benchmark results. For example, a vendor may choose to show a processor’s benchmark results on a big, 128-tap FIR filter because their processor is highly parallel and well-suited to doing this sort of work—but the target applications may be more likely to use smaller FIR filters, or another type of filter entirely.
Comparing the performance of synthesizable processor cores that are not fabricated in the same process. Different processes have different characteristics; for example, a core can achieve a higher clock speed in 90 nm than it can in 130 nm. Differences in process characteristics can obscure differences in core performance. You can only make meaningful comparisons if the cores are implemented in the same process.
Calculating benchmark results for licensable cores using clock speeds based on “typical” fabrication process results. Core vendors like to do this because they claim that this is comparable to what chip vendors do. But there’s a difference. If you buy a 200 MHz chip, it’s going to run at 200 MHz. If you license a core that has a 200 MHz “typical” clock speed, you will get some chips at that speed, but others will be slower. Core licensees typically can’t afford to speed-sort their chips, nor can they afford to throw away a big chunk of their yield—which is what would happen if they designed for “typical” clock speeds. For these reasons, core licensees usually design for worst-case clock speed—so this is the speed that should be used for calculating core benchmark results.
Disguising results by using poorly designed charts—for example, by showing a chart axis that ranges from 90 to 100 rather than 0 to 100. (We refer to this technique as using “axes of evil.”)

I hope our Top Ten list helps you use and view benchmark results with a more educated eye. Benchmarks are an invaluable tool, but they must be used with care.

Jeff Bier’s Impulse Response—Bamboozling with Benchmarks, Part 2

Add new comment