Twitter over IP

Posted by alastair
on June 04, 2008 20:43

Let’s solve Twitter’s scalability problems, shall we?

So, like most people, I don’t know much about the problems there and certainly don’t have any solutions to suggest. But I do know there are a certain class of solutions which aren’t on the table.

If you look at Twitter from a suitably high vantage point you see real-time communication between small groups. People entering short messages and having these messages appear at their peers a small time later. There’s also a central archive, but I’ve heard Twitter described as “public Instant-Messaging” and this seems to characterise it best for me.

In short, Twitter seems more suited to peer-to-peer communication than to client-server. What sort of protocol would it use? I can imagine a protocol which would be probably UDP-based, and which would send tweets to followers either directly from peers or perhaps through a local aggregation point. Large groups of followers could perhaps even use UDP multicast. Archive servers could be reached through network anycast addresses, to allow for greater decentralisation. IPv6 to get universal connectivity. And so on; fill in your own pet network technology here, there are certainly lots of potential solutions.

Instead of these, clients communicate directly with the Twitter servers using HTTP. Not only that, but they poll for updates. Bit of an architectural blunder, you might think. Well not really. In fact I don’t think the Twitter designers had any choice.

Once upon a time it was possible to deploy new application-layer protocols on the Internet. But those times have passed, it seems. These days, it’s HTTP(S) or nothing. And this is not the protocol you would choose for carrying tweets, if you had the choice. So the fact that twitter works at all over this sub-optimal application-layer protocol is quite an achievement.

This is a great example of the many ways in which innovation can be stifled by enforcing a lowest-common-denominator.

The impact is of course more widespread than just Twitter. In fact, the so-called end-to-end principle which was one of the fundamental founding principles of the Internet is now all but abandoned in practice. Geoff Huston examines the issue in some detail in a recent article, and it is highly recommended.

Of course, there are no easy answers, either for Twitter or the next application to suffer due to the proliferation of network middleware. But it’s certainly an issue that does need to be more prominent.

(This post is an obvious departure from my usual style of blatant attack pieces in order to score traffic and fame for myself. Normal service will resume shortly.)

Required Viewing

Posted by alastair
on November 07, 2007 09:39

If you’re at all interested in computing technology you can’t help but be amazed at the advances in CPU power over the last few decades, Moore’s Law, blah blah blah. But a few seconds pondering this invariably provokes the question as to how long this party can last.

The commonly accepted wisdom is that CPUs have gotten about as fast as they are likely to go in terms of sheer clock speed, and now manufacturers are turning to multiprocessing to provide more processing power for a given price point. The recent Intel price drops which made the quad-core Q6600 CPU available for less than AUD400 are a highly relevent (and welcome) data point to illustrate this trend.

This raises lots of hairy questions for developers, such as “how are we going to design our software to run efficiently in a multi-processing environment?” The previously-linked wide finder experiment is an attempt to explore some of these issues. And it’s pretty obvious that so far there is no silver bullet.

But wait, it gets worse. I will point you to a long but highly thought-provoking presentation from Herb Sutter. Turns out we are already hitting major architectural hurdles in the form of memory access limitations, and we’ll need to find some solutions for these before tackling the parallel computation problem.

Sutter’s presentation is deeply technical, but still quite accessible, and delivered with an engaging style that makes it required viewing. Highly recommended.

I recently had some experience diagnosing some memory-related performance problems (not quite in the same class as that discussed by Sutter, but similar) and I have to say there is a serious deficit in the development tools for these kinds of problems. Currently we need to look aggregate behaviour over multiple iterations to isolate some of these problems, and this is a difficult and error-prone approach. For example, check out Sutter’s technique to discover the memory cache line size in code. In the future it would be great if we could monitor cache misses, pipeline stalls, page faults, and other performance-impacting events within the debugger.

These issues also make me wonder about how higher-level languages are going to provide appropriate abstractions to avoid the performance problems. For example, garbage collection is a major win for programmer productivity but it does encourage memory usage patterns that are not always conducive to performance given architectural limitations in the underlying hardware. The same abstraction problems affect C/C++ of course but at least there is the option to go “bare-metal” where necessary.

Whatever the answers are here, it’s certain there are some interesting times ahead for developers.