Yes, Rackspace Is Down And So Are Many Of Your Favorite Sites

fallentimes · on June 29, 2009

We (http://ticketstumbler.com) use Rackspace's Mosso and were only down for 10 minutes.

Rackspace's twitter account just said: @Rackspace: All power is restored to the DFW data center - all devices affected are starting to come on-line. Details to follow.

PonyGumbo · on June 29, 2009

Same here. Couldn't have happened at a worse time, but at least it was only for 10 minutes or so.

callmeed · on June 29, 2009

We have several servers with Rackspace and several hundred of our own customers were affected by this outage (for about 30 mins).

For what it's worth, I still think they are the best managed hosting company around. The customer service is great.

This is the only major problem I can recall since this one: http://www.techcrunch.com/2007/11/12/quick-plug-the-internet...

jme · on June 30, 2009

We were down for over 2 HOURS! Not acceptable! Found out that one of our servers was stuck at a BIOS screen waiting for someone to hit "Enter". Not happy about it at all.

catch23 · on June 30, 2009

Are you using them as a colo? That's usually one of the first things I do when I setup a machine to be shipped off to a colo -- make sure that a remote boot doesn't require console interaction. Reduce the grub menu timeout to 5 seconds, always skip bios prompt regardless of errors, etc. Even if the colo provides the ability to remotely interact with the console, it's less hassle if you don't have to.

jme · on June 30, 2009

No, they set up the servers for us. Didn't think to ask, "please make sure the servers reboot without someone standing by to push 'enter' if needed". They told me that the server's BIOS was saying something about the RAID battery being charged and required someone to push 'enter' to continue. Thanks for the suggestion.

shizcakes · on June 29, 2009

I am a Rackspace customer (fortunately, in VA). Last week, they accidentally sent emails out to many customers letting them know that there was heavy construction (that they don't control) adjacent to their lot of land in Texas. I'm betting that has something to do with it.

piramida · on June 30, 2009

So what has really happened, or Rackspace does not go as far as explaining customers what was the reason and what measures are taken so it won't happen again? I'd be surprised if that is the case.

My current hoster costs less and does much more (you're informed and kept updated in case anything remotely serious happens, especially if it affects the whole DC - though happened only once during 6 years, customers had live progress on the failure resolution).

Erwin · on June 30, 2009

Oh? In the mail I got, they said the construction would be near their IAD data center which is outside of the Washington Dulles airport (which looking at the map is actually in VA).

> Beginning Thursday June 25, 2009, heavy duty excavation work on the property adjacent to Rackspace's IAD Data Center will commence. This project includes excavation work and quarrying of subsurface rock to lay a sewer line to serve a new development.

quizbiz · on June 29, 2009

I know nothing about hosting but how much more expensive can running fully redundant be? Backups available at a separate physical location ready to go online incase A goes offline? I guess the question is how much can businesses afford to pay.

Has anyone done the math to see income lost per hour offline? (I'm pretty sure Twitter makes money for being offline. heh)

bcl · on June 29, 2009

The more redundant you are the more it costs. And even with redundant network routes, power distribution, backup generators there always seems to be an undiscovered SPOF left.

Murphey's Law is the Law of the Universe ;)

One hosting provider I used, I forget who, had redundant everything, except some part of the power switching to the UPSs. During a regular UPS maintenance test it failed and half their datacenter went dark.

wglb · on June 29, 2009

I visited Savvis hosting facility several years ago and they had redundant generators (each about ths size of my two car garage), redundant network connections (some high-level peerage), two separate power lines from isolated parts of the grid, and at least two separate certified diesel suppliers. Oh, and VESD fire detection and grate-type floors to help with the cooling.

And I am sure a cost to match.

lsc · on June 30, 2009

every reasonable co-lo has redundant power and redundant bandwidth. hell, even tiny guys like me have that.

Most co-lo power outages are not due to both incoming power feeds failing; usually it is either human error or failure in the power equipment that is not redundant enough. It happens even at the best data centers.

Most network outages, on the other hand, are caused by human error. it's not very difficult to make your network extremely resilient to upstream failures; Even the smallest ISP is going to have more than one upstream. however, if you give the new guy access to the BGP routers, (or the old guy, when he hasn't had enough sleep) it's not at all difficult to break the whole thing.

vaksel · on June 29, 2009

well if its fully redundant, it'd be 2x as much.

Personally I think, the best bet is to just have a backup on Amazon. Sure its not as fast as switching to another dedicated server, but its just something to have in case of emergency, users can afford to have the page take an extra ms to load.

rcoder · on June 29, 2009

The cost to users (and especially developers) most assuredly is not simply 2x that of a single host. Planning for full replication and hot-failover from day one adds complexity, and can actually do more harm than good if your team and tools aren't fully up-to-speed on the sometimes subtle challenges surrounding replication lag, eventually-consistent stores, and STONITH.

quizbiz · on June 29, 2009

My thinking was, if they already own a datacenter in another location, why not add another floor? Obviously it's not cheap but is it really 2x? If you're big enough, you don't have to hire double the staff or buy double the resources since you already have them.

jonknee · on June 29, 2009

The expensive part about data centers isn't the square footage, it's the HVAC/power/bandwidth/servers. So you have the space, that means you still need to reserve twice the bandwidth, power, cooling and hardware. x2 cost. Actually somewhat more because you need to account for all the bandwidth and time to keep everything synced up.

lsc · on June 30, 2009

at what rackspace can charge, new hardware is nothing. the problem is that if I rent you a server, it's going to be pretty difficult for me to replicate that server to another location without a lot of cooperation from you. the more I screw with your server, well, the more likely I am to break it, too.

sunir · on June 29, 2009

It's more than that as you two datacenters, plus the cost of synchronizing it.

lsc · on June 30, 2009

for shared hosting, yes, there is no excuse for not being redundant. However, if you are renting servers, full redundancy at the server level is quite difficult to achieve. Redundancy is usually better handled at the application layer.

blhack · on June 29, 2009

I saw my sliechost box go down for about 10 minutes. I'm really surprised that they haven't sent me an email yet, they're usually RIGHT on top of that...

Anybody have any word on what happened?

mkull · on June 29, 2009

You posted JUST too soon... 10 minutes ago I received:

------------------------------------------------------------------------ Hello,

We have experienced an interruption in power to a portion of our Dallas-Fort Worth data center facility. Power has been restored and our DC engineers are working on devices that need to be manually brought back online as quickly as possible. Further updates will be made as soon as new information is made available. Please monitor our MyRackspace customer portal as this is the quickest way we can get updates out to you.

Sincerely,

Rackspace

wglb · on June 29, 2009

That actually sounds pretty scary. What sort of incoming fault trips (what sounds like) DC breakers?

eli · on June 29, 2009

If I recall, they were doing maintenance to their chillers this week. Hmmm....

wglb · on June 30, 2009

Ah, so perhaps a little http://en.wikipedia.org/wiki/Iatrogenesis

zandorg · on June 30, 2009

I had a laptop with 2 good batteries and I lasted out a power cut watching a whole DVD. The ADSL 'modem' was powered by the phone socket and shared over Wifi from another laptop with a (single) good battery, so I could continue using the internet while watching this DVD!

I guess what I'm trying to say is that laptops might be a good solution for a hosting company, because of the lower power requirements and cheap batteries.

Locke1689 · on June 30, 2009

That's not even remotely price feasible. Also, lower power requirements does not necessarily mean efficient. In terms of power/watt, they have inefficient power conversion and are particularly difficult to cool. Trust me, rackspace knows what they're doing.

shizcakes · on June 30, 2009

Actually, they aren't doing anything particularly amazing - my servers are just Dell 2950IIIs.

Still, for one guy managing things, it's way easier than co-lo.

invisible · on June 30, 2009

Google actually implements batteries on their servers individually rather than as a UPS for many. The problem with "lower power requirements" is that it normally coincides with slower speeds/less efficiency in some fashion.

kragen · on June 30, 2009

10 minutes is 1% of 17 hours, 0.1% of a week, 0.01% of 69 days, or 0.001% of 23 months. So they could maintain "five nines" if they had an outage like this every two years — except for people like jme.

But of course outages follow a power law. On January 15, 1990, AT&T's entire long-distance network crashed for 9 hours.

jacquesm · on June 30, 2009

there was a major outage less than two years ago iirc in the same data center, a transformer blew up and shifted a wall. they're not going to get their 5 nines for a long long time.

chops · on June 30, 2009

Wow. That sounds almost as bad as the CIHost outage where thieves sawed through the walls, stole a bunch of servers, and tazed the employee that responded to the alarm.

I'm so glad I got my servers out of there only 2 months prior (after being there for almost three years).

Dave_Kean · on June 29, 2009

Cheers! May this source of so much spam be offline for 150 years.

_pius · on June 30, 2009

What are you talking about?

sneakums · on June 30, 2009

I guess he gets a lot of spam from Rackspace customers.