If you’re at all interested in computing technology you can’t help but be amazed at the advances in CPU power over the last few decades, Moore’s Law, blah blah blah. But a few seconds pondering this invariably provokes the question as to how long this party can last.
The commonly accepted wisdom is that CPUs have gotten about as fast as they are likely to go in terms of sheer clock speed, and now manufacturers are turning to multiprocessing to provide more processing power for a given price point. The recent Intel price drops which made the quad-core Q6600 CPU available for less than AUD400 are a highly relevent (and welcome) data point to illustrate this trend.
This raises lots of hairy questions for developers, such as “how are we going to design our software to run efficiently in a multi-processing environment?” The previously-linked wide finder experiment is an attempt to explore some of these issues. And it’s pretty obvious that so far there is no silver bullet.
But wait, it gets worse. I will point you to a long but highly thought-provoking presentation from Herb Sutter. Turns out we are already hitting major architectural hurdles in the form of memory access limitations, and we’ll need to find some solutions for these before tackling the parallel computation problem.
Sutter’s presentation is deeply technical, but still quite accessible, and delivered with an engaging style that makes it required viewing. Highly recommended.
I recently had some experience diagnosing some memory-related performance problems (not quite in the same class as that discussed by Sutter, but similar) and I have to say there is a serious deficit in the development tools for these kinds of problems. Currently we need to look aggregate behaviour over multiple iterations to isolate some of these problems, and this is a difficult and error-prone approach. For example, check out Sutter’s technique to discover the memory cache line size in code. In the future it would be great if we could monitor cache misses, pipeline stalls, page faults, and other performance-impacting events within the debugger.
These issues also make me wonder about how higher-level languages are going to provide appropriate abstractions to avoid the performance problems. For example, garbage collection is a major win for programmer productivity but it does encourage memory usage patterns that are not always conducive to performance given architectural limitations in the underlying hardware. The same abstraction problems affect C/C++ of course but at least there is the option to go “bare-metal” where necessary.
Whatever the answers are here, it’s certain there are some interesting times ahead for developers.