I really think that this is the big plus for Google's app engine. Most web sites are essentially front end GUI's for a database with a sprinkle of data crunching thrown on top. It's all about the database. And, one of the biggest pain points for a growing startup is scaling the database.
Enter App Engine. Google has spent years developing Big Table and their underlying distributed OS, and they are probably years ahead of anybody else on this one. Essentially what App Engine consists of is giving 10,000 developers read/write accounts to the worlds largest distributed database management system.
Now, it's up to the developers to add their HTML/CSS/Javascript GUI's and a little data crunching middle-ware all the world to use.
One is that Google's DB is both unique and closed-source. Worse, it isn't even a product like Oracle or SQL Server that I can license and then use the way I want. Even if I wanted to buy the hardware and maintain the software, I can't move my app off of Google. I'm a sharecropper. They get my traffic data for free, and if they change their pricing structure or the way their DB works I have to scramble to keep up. And if they decide to screw with me (like, ahem, becoming my competitor) I will have to port my app to some other scalable platform while serving all my accumulated traffic and simultaneously fending off a deep-pocketed competitor.
I'm also suspicious of the idea that BigTable is some sort of magic wand that solves any and all scalability problems. I'm very sure that BigTable scales for the classes of problems that it was designed for. And I'm sure that a wizard like Steve Yegge or Peter Norvig can adapt it to many other classes of problem. But, without actually knowing anything about BigTable, I'm prepared to bet that using it to scale your web app will require (a) knowing a fair bit about how the tool works and (b) customizing your app's data storage scheme to compliment the tool, after which (c) there will still be some corner cases that don't work very well and require clever hacks and compromises.
In other words: I predict that in three years there will be Craigslist job postings for "BigTable DBA with five years of experience".
The reason HTTP is trivial to scale and databases aren't is that the HTTP daemons are largely stateless, while databases are all about managing state. Doing that in a scalable, reliable, consistent way is just a fundamentally hard problem. Oracle and DB2 haven't done particularly well at trying to solve this (within the constraints of a traditional RDBMS), let alone the various open source projects.
I spent a while trying to build a synchronous multimaster replication system for Postgres. I think we made two main mistakes:
1. Trying to provide the traditional ACID semantics that people expect on a single-site DBMS isn't feasible, at least without incurring a very significant performance and complexity overhead.
2. Horizontal partitioning is key. If you make it easy for the user to partition their data, you now only need to maintain consistency over a single partition.
Only 0.001% of startups (and I'm being generous) need the sort of power being discussed here. The reason "a scalable database" doesn't exist is because there isn't a big enough market for one.
I have no doubt we'll hear more about this in coming months, not because of any specific need, but because it's the latest fad.
I was just talking about this with a YC applicant yesterday. Scaling up HUGE is a really interesting topic, and a lot of nerds love to talk about it...but the number of sites that actually require extreme scaling is very low. When we first started selling Virtualmin we had a lot of early adopters asking about database and web app replication, load balancing, etc. Not because the folks asking actually had sites that needed that kind of performance, but because they're cool technologies to play with. A thousand paying customers later and the demand for those features has dropped to background noise (the same early adopter folks who mostly just want to play with it rather than have the traffic to justify it). Far more of our users are asking for the ability to run more (more sites, users, mailboxes, applications, etc. and not generally more reqs/second) on a single server rather than the ability to spread load across many servers.
The performance of hardware has managed to keep pace with the needs of the vast majority of websites over the years, to the point where very few sites (like the top 500 or so) actually ever need more than just the basic scaling ideas that are easy for just about any sysadmin to implement (split mail, web, DNS, and database onto independent boxes, use memcached, maybe a web load balancer like Squid or pen, etc.).
Could it be that the people who select Virtualmin aren't the same people who need to scale up? Not a rhetorical question, I don't know you guys well enough to have an idea.
> The performance of hardware has managed to keep pace with the needs of the vast majority of websites
That makes sense, but the absolute number of web sites that need to scale up in a big way is also growing, even if it could actually be shrinking as a percentage.
It could very well be a bit of a situation where, "We don't offer a scaling solution, so customers that need to scale don't come our way, so we don't hear from customers that they need to scale."
But, it's worth noting that my previous (now defunct) company was entirely devoted to web performance and scalability. It's not a volume business--the folks who need it spend a lot on the problem, but their just aren't that many who need the extreme solutions. I've often chuckled when the few Virtualmin customers who do want scalability have explained their requirements and they match to a great degree the products I was building five years ago (and found to be a niche that wasn't worth continuing to expend effort on). If I thought it would be profitable to pursue, I could certainly revive some of those products in the context of Virtualmin...but I think there are far more profitable areas for us to work on.
Everything related to the web is growing, so all ships are rising, including scalability issues...but I believe several others are rising much faster.
Don't needed? Even the book of Founders at Work is full of stories where the problem was scaling (delicious, paypal, blogger, ...). Every startup that's going to work well is going to have this kind of scalability problem (and it is most of the time related to DB and not HTTP front-end side).
While I haven't used either (or BigTable for that matter), I was under the impression that both HBase (the Hadoop database) and HyperTable ( http://www.hypertable.org/ ) were Open Source competitors to BigTable. But there does seem to be some work happening there.
It's pretty rarefied air up in the high scalability world...my previous company built website acceleration tools, and the customer base there is pretty small (they have plenty of money, but it's not a high volume business). As much as us nerds like to think about HUGE problems like this, it's a problem that just doesn't come up that much.
Maybe Sun will dust off Clustra from whatever closet they left it in. Or maybe H-Store will come out soon and support a large enough subset of SQL to satisfy people.
I don't have any direct data about Clustra but I read the Hypra/Clustra book[1] a while ago. It uses a tightly coupled scheme of parallel hardware, software and networking to provide high availability. Scaling that sort of system will cost you a lot more than running H-Store on a cluster of cheap boxes.
I'm interested in this myself, if only for coolness as SwellJoe mentioned. The comments on there mentioned CouchDB, does anyone know if it's worth using yet?
Enter App Engine. Google has spent years developing Big Table and their underlying distributed OS, and they are probably years ahead of anybody else on this one. Essentially what App Engine consists of is giving 10,000 developers read/write accounts to the worlds largest distributed database management system.
Now, it's up to the developers to add their HTML/CSS/Javascript GUI's and a little data crunching middle-ware all the world to use.