Processor vendors targeting signal processing applications have put a lot of emphasis on compilers over the last few years. Many of the new processor announcements I’ve seen recently stress “compiler friendliness” as one of the main advantages of the new architecture. And vendors like to boast about the enormous amounts of time and money they’ve spent improving their compilers.
Even in the era of gigahertz processors, it is hard to meet demanding performance and cost targets without tightly optimized code. For engineering managers with tight budgets, it is tempting to buy into the idea that creating efficient code requires nothing more than setting the appropriate compiler options. Unfortunately, this is rarely the case. Although the quality of compiler-generated code has improved in recent years, compilers address only part of the software optimization problem.
To obtain efficient code, signal processing software must be optimized at four distinct levels. First, the overall software architecture must be designed to make efficient use of processor resources. This involves considerations such as how data flows among software modules. For example, a video application may have several successive processing steps, such as deinterlacing, noise reduction, and color enhancement. The straightforward thing to do is to push an entire frame of video through the first step, then through the second step, and so on. But if a frame of video is larger than the local data memory, this can be very inefficient. You’re probably better off pushing a small portion of a frame through all of the steps, and then moving on to the next portion of the frame. Unfortunately, compilers have no ability to perform this type of optimization.
Second, appropriate data types must be selected. If you use a data type that’s larger than you really need (for example, 32 bits when 16 will do), you’re wasting resources. On the other hand, if you use a data type that’s smaller than you really need, your system is likely to malfunction. Again, compilers have no clue about this sort of thing.
Third, the software must be optimized at the algorithm level. For example, it is often possible to improve performance by combining multiple algorithms into a single processing step, or by substituting one algorithm for another. Such optimizations sometimes yield huge performance gains. And, again, this type of optimization is beyond the capabilities of compilers.
Ultimately, the software must be carefully mapped to the processor’s instruction set and pipeline. For example, it is sometimes necessary to “unroll” loops to avoid pipeline stalls. This is where compilers can do great work, and where the difference between a good compiler and a bad one becomes evident. But even if the compiler does a perfect job at this level, the code is likely to be far from optimal unless a savvy human has first optimized the software at the other three levels.
Higher-level optimizations are becoming more important over time, as both applications and processors become more complicated. For example, many of the latest DSPs contain multiple CPU cores on a single chip. Figuring out how to optimally partition an application across these multiple cores (a software architecture optimization) can be a challenge—and compilers don’t provide any help in this area.
I don’t want to give the impression that compilers don’t matter. On the contrary, good compilers are very important. But it’s vital to understand that even the best compiler is only tackling a small subset of the optimization problem—you still need a skillful programmer to get the job done right.
Add new comment