By now, most people who work with processors—whether in data centers, PCs, mobile devices, or embedded systems—understand that parallel processing is the way to get both high compute performance and good energy efficiency for most applications. And most of these people also realize that programming parallel processors is challenging. There are many different types of parallel processors, including CPUs with single-instruction/multiple data capabilities, multi-core CPUs, DSPs, GPUs and FPGAs, among others.