Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The shutdown process is immediate. Everything just gets terminated. Redis flushes to EBS every so often. If the server running redis did die, then there would just be a "service unavailability" on services that depend on Redis. It's no major biggy... My nagios server would spot that the server died, and would fire off a new request with the preconfigured cloud-init script that tells the server what to install and then to mount the very same EBS volume to the server again. (You can do this using the ec2-* API command tools). I decided to use cloud-init scripts rather than Chef/Puppet.

Regarding uploads of images, because users uploads are saved directly to the "uploads" EBS volume which is shared using NFS, even if the instance is in the middle of an upload to S3, I will know that it failed because I could do a simple query to check my database to see when a job was started, and when it was finished. Whenever my files are "syncronised" to S3, I then set the file in the database to "s3Synced=true". (The reason why I don't upload directly to S3 using the new signature/CORS feature is that I have to generated 6 different dimensions of each uploaded image and then also watermark)

If the server running the single EBS dies (this is a single point of failure for me, but could easily be avoided if I wanted to run something like gluster or even two EBS volumes on different servers (which I don't )), then file uploads are suspended temporarily. It's a bit pointless accepting uploads because although I could store them on each server that received the request, the batch processing of resizing them and watermarkering is performed by gearman workers which run on a small cluster of a handful of micro servers.

If a server is in the middle of a transaction and dies, well then you're kinda screwed. I'd suggest that if you know that certain traffic must not drop for any reason, such as payment gateways etc, then you should route traffic through a reverse proxy to an on demand/reserved instance. You can do this in nginx effortlessly.

I'll write a blog post about the entire stack next month if people are genuinely interested. Jeff Barr just dropped me an email actually (Hi Jeff, if you are reading this comment again, saying he would be interested in a full write-up).



Thanks for outlining a lot of details here, but a full writeup would also be great.


Hi Chris, looking forward to more details.


That's quite an interesting architecture.

Do you have data on how frequently your spot instances terminate gracefully (with time to finish their requests) versus how often they zap out?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: