Buddy Brad Feld has a great post on Shared Nothing Architecture, as a potential solution to performance and reliability issues faced by services I use on a day to day basis: TypePad and del.icio.us (and to some extent, Bloglines - though I don't use it so much now). I had actually spotted that del.icio.us was down as well , and was about to write my own piece out of frustration, but Brad is summarizing the situation well. In the meantime, here is my backup del.icio.us.
On the heals of TypePad’s 18 hour outage this week, there’s been (and will be) a lot of continued discussion about how to build scalable and reliable online / web-based applications. This is not a new problem (I not so fondly remember major and systemic outages in large services such as eBay and Amazon in the late 1990’s) but it’s gotten new attention as some of the emerging applications have scaled up the point as to have an interesting numbers of regular users (e.g. – it sucks if their service goes down for more than 15 minutes). For example, as far as I can tell, del.icio.us has been down for the last four hours (“del.icio.us is down for emergency maintenance. we'll be back as soon possible.”) and on 12/15/05 Bloglines acknowledged that “Bloglines performance has sucked eggs lately.”
Tim Wolters – an extremely capable CTO – has an introduction to how he is approaching this at Collective Intellect. He’s taking a page from Google’s playbook and developing a web service based on a “shared nothing architecture”. On Friday, I had two different discussions about scalable architectures (e.g. “we’re going to scale up between 10x and 100x on a meaningful base in 2006 – here’s what we are planning”) and both included elements of what Tim is describing.
The ultimate Shared Nothing Architecture relies on mirrored data centers in different physical geographies that allows a system to switch over in (quasi) real-time in case of any type of failure (power, hardware, database, etc.) - and this is expensive to deploy. Del.icio.us is not there yet, but will clearly benefit from Yahoo's scalability expertise. And as to Six Apart, well, let's hope that they'll figure this out since quite a few of us users have expressed their “discontent” (and I am being soft since many of my close friends are involved with the company). These problems happen with almost every companies as they experience a rapid growth of their online presence, and often their backup solutions are just not appropriate (and remember, don't trust these backup generators).
If you need to substantiate early exits by Web 2.0 companies, beyond generating nice payoffs for company founders, look no further: scaling to tens of millions of users and gigabytes of traffic is no simple feat, and the companies facing these issues will be at risk of losing at least a portion of their momentum if they don't handle the situation properly.
Update: it appears that del.icio.us has had to rebuild their corrupted database after a... power failure - I wonder what happened to the generators...
Comments