Amazon Web Services Goes Down, Takes Many Startup Sites With It
Erick Schonfeld
Feb 15, 2008

amaxon-web-services-logo.pngAmazon Web Services suffered a major outage this morning, affecting the thousands of Websites that rely on its storage (S3) and cloud computing (EC2) services. Startups including Twitter, SmugMug, 37Signals, and AdaptiveBlue, for instance, use Amazon’s S3 storage service to store all the data for their Websites. Reports started coming in across the Web, email, and Twitter about the outage (Twitter only uses S3 for file hosting, not its main messaging application). The major difficulties seem to have been fixed, but some issues persist. The outage started at around 4:30 AM PT.

This could just be growing pains for Amazon Web Services, as more startups and other companies come to rely on it for their Web-scale computing infrastructure. But even if the outage only lasted a couple hours, it is unacceptable. Nobody is going to trust their business to cloud computing unless it is more reliable than the data-center computing that is the current norm. So many Websites now rely on Amazon’s S3 storage service and, increasingly, on its EC2 compute cloud as well, that an outage takes down a lot of sites, or at least takes down some of their functionality. Cloud computing needs to be 99.999 percent reliable if Amazon and others want it to become more widely adopted.

Update: A response from Amazon PR:

For one of our services, the Amazon Simple Storage Service, one of our three geographic locations was unreachable for approximately two hours and was back to operating at over 99% of normal performance before 7 a.m. pst. We’ve been operating this service for two years and we’re proud of our uptime track record. Any amount of downtime is unacceptable and we won’t be satisfied until it’s perfect. We’ve been communicating with our customers all morning via our support forums and will be providing additional information as soon as we have it.

Advertisement
  • Related Topics
Advertisement
  • http://book-bot.com gilltots

    maybe a “truck” drove into a “power thingy” and the “chillers” didn’t “cycle” properly…

    …nah, that’d be stupid!

  • Ghaus

    Oh Damn! My start-up company was going to rely totally on Amazon S3. But now we will have to think again.

    Thanks for the news Erick !

  • http://www.echosign.com Jason M. Lemkin

    For sure. We’d switch in a heartbeat for 99.999 uptime at their current prices. They don’t claim that level of uptime for good reason. Today at least you get what you pay for. Their 99.9 SLA isn’t good enough for critical web apps. Skype vs POTS …

  • weeeee

    @ghaus. nothing is 100%. this is like their first major outage correct?

  • http://hyveup.blogspot.com Xavierv

    I was told recently to consider Amazon Clouds for my startup. I’ll reconsider it now.

  • http://www.barrywelch.net/blog Barry Welch

    Cue Nelson Muntz catch phrase.

  • weeeee

    “Amazon has a goal of 99.9% uptime with this service. This translates to 45 minutes per month of expected downtime. Achieving 99.9% uptime is a significant challenge—but worth striving towards!”

    Twitter says they only saw 2 to 3 minutes of downtime.

    Your lame ass poor startups can handle that I think…

  • http://www.zoliblog.com/2008/02/15/the-dawn-of-saas-on-saas-even-while-amazon-s3-is-down/ Zoli’s Blog

    The Dawn of SaaS-on-SaaS – Even While Amazon S3 is Down….

    TechMeme is great in threading together relevant posts, but is largely based (so I think…) on direct linking, so of course it could not auto-detect the ironic relationship between:

    Amazon’s S3 service outage this morning (news of which&nbs…

  • Luis Villa

    don’t switch to Nirvanix, it is extremely buggy. cloud computing is not ready yet.

  • http://storagememo.com StorageMemo

    unfortunate.

  • http://s3box.com s3box

    99.9% uptime -> 0.1% downtime allowance… 8-)

  • sd

    if you have money, don’t use it. if not, what other options do you have? you get what you pay for.

  • Jaisen

    Not sure why the commenters are ready to bail on Amazon’s services. Do you have better options at the prices Amazon charges?

    BTW…last I read Smugmug doesn’t rely on S3 as their primary storage service.

  • EH

    They aren’t ready to bail. They are just wankers who want to say they have a startup so they can show their friends their name on the exclusive Techcrunch comment board. They don’t have businesses to bail.

  • http://ydrive.com YDRIVE

    There is no better options to date… until Google readies their GDrive, or until Microsoft completes buying Yahoo! (whichever is sooner) :-D

  • http://blogs.smugmug.com/don/ Don MacAskill

    I’m the CEO & Chief Geek at SmugMug.

    We do rely on S3 for our primary storage, but we do maintain our own “hot cache” of data in our datacenters, too, which is less than 10% of our total storage.

    Our customers weren’t affected by this morning’s outage.

  • http://pbwiki.com/ Nathan Schmidt

    Don’s got the right approach here with SmugMug’s use of S3. It’s great for near-line storage, archiving, and other activities which require relatively low-cost, flexible capacity, ‘good’ performance, and high integrity.

    Never build your architecture to require low-latency, high-availability access to S3 or its competitors, because you won’t get those – that’s not what it’s for, that’s not what it’s optimized for, and you’re never going to be able to peel back those layers of abstraction and long-haul network.

    We’re still a long way away from the ‘magic infrastructure cloud’ but by keeping the strengths and weaknesses of these hosted services in mind, you can still get a tremendous amount of value from them.

  • http://www.CARversation.com CAR

    This is just horrible, what a nightmare.

  • Ghaus

    @ 4 & 7

    Yes this is Amazon’s first major outage.

    For start-ups which gets a lot of traffic, for them 5 minutes downtime means thousands of dollars of loss.

  • http://www.fabianschonholz.com Fabian Schonholz

    I agree. It is ashame because Cloud computing as a service is a really good idea and I believe that probably Amazon is the best example of a good Cloud. But the lack of SLAs and problems like this makes it hard to trust just outside of backups and archives.

  • Steel

    Nothing is perfect. You techies need to get a grip on things and be real.

  • weeeee

    @ 18

    “For start-ups which gets a lot of traffic, for them 5 minutes downtime means thousands of dollars of loss.”

    You’re so full of it. examples of your exaggerations please…

  • BlogReader

    [ For start-ups which gets a lot of traffic, for them 5 minutes downtime means thousands of dollars of loss. ]

    Thousands of Flooz dollars maybe

  • http://technozzle.com Baher

    @15 That’s a great strategy Don, I think all startups should have some kind of contingency plan for any cloud service, eventually things go wrong.

  • Janusz

    Thanks Techcrunch. I have sent you that tip for a story about 10 minutes after the service went down and nobody on the internet knew about it and haven’t had a single thank you from you. Next time I won’t bother.

  • Matt

    sounds like #13 is having the “client work” doldrums….

    sure they have startups!! so could you! YOU CAN DO IT!!

    quit yer pouting…

  • Steel

    @#18

    If 5 minutes downtown can cause a startup to lose thousands of dollars, they’re making as much money or more, than Fb, MySpace, and who knows…maybe as much as Exxon/Mobil

    You’re lame.

  • http://greacen.com CG

    Hey #20-

    The techies usually know that nothing is 100% reliable. It’s either groupthink or partially-informed folks that lose touch with the fact that systems fail (sometime for lame reasons).

    With a 99.9% uptime promise, everyone in the business should be prepared with 40-ish minutes of unplanned downtime a month.

    CG

    PS- #15: flooz! lol

  • http://mikeshotdish.com Mike Wills

    Don’t be quite that hard on them. We don’t know exactly what they were doing (if anything) and this is the first major outage they have experienced. I know there is a cost to downtime, but if you have thousands of dollars on the line for being down for a hour or so, you shouldn’t be depending on 3rd parties to house your data. I know they may be cheap, but doesn’t the old saying hold true? “You get what you pay for.”

  • Mik

    Use S3 as backup not a content delivery network!!!

    For smugmug, how did you serve the 90% of photos that wasn’t in your hot cache?

  • Nick

    It seems that “cloud-based” storage services are receiving lots of attention these days and for very good reasons. Since web businesses depend on them, any amount of downtime is unacceptable.

    So the fact that Amazon S3 went down is definitely cause for concern. Today’s event clearly demonstrates that Amazon has a “single point of failure” and that their users need to take precautions.

    The Amazon situation should not reflect negatively on all storage services. There are companies (Nirvanix being one) that have put together an architecture of clustered nodes around the world with no single point of failure avoiding the potential for downtime for their customers.

    -nick

  • http://www.shutterpond.com JL

    We weren’t down very long at Shutterpond Photo Contests, and all of our photos are hosted with s3.. It was a small hit.. Nothing to fuss about.

  • http://127.0.0.1 Mike

    If you rely on a single point of anything for mission critical data you’ve got bigger problems.

  • http://smoothspan.wordpress.com/ Bob Warfield

    By and large cloud computing is more reliable. But, it’s hard to see that because reporting on cloud computing is going to be much better because so many more people use it.

    The other important issue is that the rate of adoption for things these days goes far beyond what business planners are used to dealing with. Put simply, we’ve eliminated most of the friction from the markets using the web. That’s only going to make it that much easier to underestimate demand and get into trouble.

    More on my blog:

    We’ve gotten so good at reducing adoption friction, that we’ll see a lot of this kind of thing. It just isn’t possible to plan for it.

    More on my blog:

    http://smoothspan.wordpress.com/2008/02/15/google-reports-iphone-usage-50x-other-handsets-amazon-s3-goes-down-low-friction-has-a-cost/

    Best,

    BW

  • Chuck

    Would explain the problems with Woot! earlier.

  • Nick

    @ #8 Nirvanix developer boards are always open, i haven’t encountered any bugs, and obviously there was a Bigger Bug today ;-)

  • George

    One good thing about more and more sites using AWS is that when one goes down, they all go down. If lots and lots of popular sites have an outage at the same time, then from a user’s perspective it shifts the blame from “their site is down”, to “the internet is down”.

  • browse

    SLAs are a joke anyway. They generally only pay you back for the prorated time of outage.

    If my customers can’t get to their data, that costs them a lot of time-money, and makes me look bad. Getting my $5 back for the storage is inconsequential.

  • http://www.elephantdrive.com Ben Widhelm

    I’m CTO of ElephantDrive, one of the first consumer facing applications to use S3. We have been utilizing the platform for nearly 2 years, and have been measuring failures, performance, etc. with high granularity.

    Our system is built to automatically detect outages, and in those events failover to our internally managed storage.

    ElephantDrive users were not affected by this outage, at all. Further, we’re transferring data to S3 nearly continuously, and did not see a break in the hours mentioned.

    Does anyone know if the problem was localized somewhere?

  • http://www.appistry.com/blogs/bob Bob Lozano

    Counting on a cloud provider to meet YOUR SLA is bad business – only works if failure is OK for you. More here: http://www.appistry.com/blogs/bob/amazon-s3-still-limping-limits-clouds

  • Paul J

    Loosing thousands of dollars in a 5 min period would mean you’re loosing at least $1K x 288 (periods of 5 mins in a day) = $288K minimum per day. Over the course of a year that is $105,120,000 Gross Revenue. That’s a base if you only lost $1,000 not “thousands”.

    I can’t speak for the general concesus but I don’t consider your company a start up if you’re Grossing $100mil a year. Maybe a growth stage company but that would seem beyond “bootstrapping”….

  • cease

    the downtime.. paying for the scalability is of course an analysis to occur.. but for most startups.. the benefits of these services, greatly pay for them selves, than the minimal downtime. Now if this starts to happen more and more thats another question

  • http://www.buzzpal.com chrisco

    It’s good to have the thing have an outage… everybody and everything is still “green” until he/she/it suffers and adversity… that is a true test. I wouldn’t want to hire anyone or use any service that has not had a run in with some sort of fire-test… and survived and grown stronger. I tip my hat to Amazon and will be watching to see how they grow from the experience. Cheers.

  • Alex

    I wrote a similar post on GigaOm yesterday regarding cloud computing. In short here it is:

    1) SMB and Startups have NO choice. Well, that is unless they want to raise another one or two million dollars in venture capital for infrastructure and tech ops people.

    2) SLA’s don’t add up to dog crap. OK, your down, now what? Sue Amazon, C’mon. …just for ways to build in redundancy.

    3) Do you really believe you can run a data center better than Amazon? Think AGAIN! Don’t be silly!

    4) Just make sure you have a good PR Firm on retainer to draft a beautiful and sympathizing (“I feel your pain”) press release.

    Yes, it’s easy to point the finger when shit like this happens? But, when Microsoft Exchange Server acts up what is the realistic alternative? Today, none. Tomorrow enterprise email from Google or Yahoo.

  • Alex

    @42

    Bingo! You’re spot on!

    I’ve been through that dog and pony show a few times. It’s very tough.

    It builds character… and makes you ……”sip on gin and juice” (as Dr. Dre or Snoop Doggy Dog once said)

  • http://www.snapseeker.com Ericson Smith

    SnapSeeker.com was totally devoid of photos. It was a bummer for us this morning. And there’s nothing that could be done. (sigh)

    At the same time, where are you gonna find a service like S3 with the ability to scale and serve content with the reputation of Amazon? I’ll stick with them unless they get flakier in the future.

    http://www.snapseeker.com

  • anon

    @29 I wouldn’t rule it out completely.

    The trick I use is the following:

    - Store static files on my server at some sub-domain; e.g. static.mydomain.com

    - Store a copy of these files in S3 in bucket static.mydomain.com

    - Create a CNAME record in my DNS so static.mydomain.com points to static.mydomain.com.s3.amazonaws.com

    Now, if AWS dies, I can quickly edit my DNS so static.mydomain.com points to my server. It may take some time for the DNS to propogate, but usually it will happen very quickly. This may not be the best solution, but it allows one to recover if AWS happens to die. Any one have better solutions?

  • http://www.panopta.com Jason

    @40 – This is assuming that we’re just looking at a constant stream of orders, for example an ecommerce site, that doesn’t have any noticeable loss once the service is fully restored.

    For many companies, the impact is less the actual amount of downtime and more the fact that the downtime occurred at all – for example, a high-end hosting company that’s supposed to have fully redundant infrastructure will suffer loss of some current and possibly many future customers when an outage takes their customers offline.

    For startups that are trying to convince their customers to trust them, something like this *could* have quite an impact. Of course, for others it won’t have a lasting effect after service is restored – really depends on the specifics of their business model.

  • http://www.losangelescardonations.com CanCar

    Today’s event clearly demonstrates that Amazon has a “single point of failure” and that their users need to take precautions.

  • http://www.kompoz.com Raf Fiol

    We use S3 for all user-generated audio content uploaded to kompoz.com. But we also keep a local copy of active (“hot cache”, as @15 outlined) tracks. Truly, S3 has been awesome. It’s fast, easy to implement, and inexpensive. I have no plans to jump ship just because of this issue. Of course, having a strategy like @15′s “hot cache” soluition softens the sting of downtime — something all companies should incorporate into their plan.

  • http://www.mp3salad.com JAlpino

    I love how people get all upset about a short period of down time….. I doubt that their own systems serve the same uptime as S3…. how lame

  • http://www.foliotek.com Chris Miller

    Dear Amazon, we have been considering using your S3 service, but now have concerns. Let me tell you what would help us trust you enough to use your service in the future:

    - Please come clean and tell us in detail what the heck happened.

    - Please tell us more (technical stuff) about how your system is designed to provide reliable operation to your customers

    - It would be great if you were to share how you will improve your system to avoid this problem in the future.

    You are trying to sell your service to people who care about their customers. We want them to have a problem-free experience using our service. We care enough and know enough that additional information would be key to us chosing your service. Thanks.

  • Paul

    for anyone badly affected by this there is clearly a business-model issue..you are buying what you can afford (99.9% uptime), not what you need (your own servers)…if you can’t afford what you need, do you have a business?

  • Matt

    Yeah, the second paragraph of this article is quite off mark. (“Unacceptable”….. “nobody”….. “needs to be”…. if it wants to become…..” – bad commentary). The article could have given some more facts instead, or at least objectively compare the amount of downtime to other SLA’s.

  • http://kunalu.com kunalu

    A good question for SOA practitioners…now if every party can promise 99.999999% uptime, a system that depends on n parties will have (99.999999%) ^ n uptime only.

  • Bitnoid

    Anyone out there feeling like they need more control over storage? Need an affordable, high-performance, scalable and reliable storage cluster to run your business? There’s a company out there called Caringo with a software product that runs on low-cost, commodity servers that you can build a robust storage cluster with. You can start with as little as a Terabyte and scale to Petabytes in increments you decide and when you need to. You should check them out if you want to have more control of the infrastructure that supports your business. http://www.caringo.com.

  • http://www.jhatak.com/Buckler/BucklerHomePage.htm New Fast Browser

    Oops :-)

  • Timothy

    Guys, let’s face it. S3 can go down and also it is not a CDN either. It is a good storage solution, but is not fully redundant.

    Smugmug is smart, they put their storage into S3, but also used their CDN to cache the content.

    S3 is not a delivery mechanism and if you are using them for this, you will fail.

  • http://www.czaries.net Czaries

    Guys… come on!! This is their FIRST major outage, and it only affected businesses using their services for a few MINUTES. That is already a better track record than the company who hosts your website probably has. Shit happens. If you know anything at all about computer, you should at least know that. Stuff breaks, stops working properly, etc – sometimes for the stupidest of reasons… but IT HAPPENS.

    If you think ANYONE can promise you 100% uptime, you’re kidding yourself. 99.9% is the best uptime guarantee you’re going to get. It’s unfortunate that so many commenters have been turned off of Amazon S3 just due to this little pebble in the road. Nothing in life is perfect, get over it and move on.

  • Wayne

    For the posters who are saying that an S3 outage implies a single point of failure: this isn’t a correct assumption. 100% uptime is not achievable, but in a mature hosting environment, availability issues usually occur only when multiple independent failures coincide. A distributed service like S3 would never have the track record it does if it weren’t designed to handle incidental component failures.

    What matters is what Amazon provides (three nines storage at a certain cost) and how you can engineer that into a solution that meets your business needs. If those needs include better than 99.9% availability for a given feature, don’t just assume you can meet that better than Amazon can–be realistic about your requirements and design to meet them.

  • http://www.lastpodcast.net/2008/02/15/friday-afternoon-thoughts-2/ Friday Afternoon Thoughts : The Last Podcast

    [...] the outage really wasn’t that big a deal – at least not big enough to justify the paranoia in the TechCrunch comments on the [...]

  • Bobby Delicious

    we just got on as new customers on amazon, and I can tell you the outage was more like 2 or 3 hours today. (or tonight if your on US time). We are seriously considering to switch and focus on our second infrastructure option instead. Sure, twitter might only have been down for a short time, but as a low priority start-up, we were down much longer..

  • http://popularo.com/blog/2008/02/15/should-popularo-use-amazon-web-services-hmm/ popularo blog

    Should popularo use Amazon Web Services? Hmm…….

    In our early beta version, we are using some of Amazon’s web services (namely S3 and SimpleDB) – but have been considering using our own storage and database instead.  With today’s outage I’m not sure if that is a great strategy for …

  • http://www.popularo.com Scott from popularo

    In our early beta version, we are using some of Amazon’s web services (namely S3 and SimpleDB) – but have been considering using our own storage and database instead. With today’s outage I’m not sure if AWS is a great strategy for us.

    We don’t have huge amounts of data to store like some companies (smugmug comes to mind), so using AWS was mostly for the peace of mind that we would be able to scale quickly after our beta goes public and all of Digg’s users abandon them for us . We have a meeting tomorrow to take a closer look at our strategy for handling lots of new traffic in a short period of time, and I have to say that it doesn’t seem likely that Amazon will be included in the party.

  • http://www.surfbuddy.com Adam

    I still don’t get it, sure the Amazon services are great. But the prices? Maybe OK if you have them running for a few hours a day for alpha or dev but on every hour for every day for a month? Plus data transfer charges? A rack in a data centre works out cheaper unless you are not planning to be in business for a year.

  • http://www.combyo.com Pinaki Ghosh

    Now I would feel more comfortable going with Amazon since they will fix the issue and put more dedicated resources at it.

  • http://failure-server.com TechCrunch + Math = FAIL

    Erick,

    I know you don’t get paid to do math, and this is a bit of a cheap/personal pot-shot, but have you done the math? Based on the information in their press release, they’ve had 0.99982 uptime over the past two years.

    Imagine that something, anything goes wrong, and you’re a startup, with startup resources. What’re the chances you can fix the problem just as quickly? And even if you can, do *you* want to worry about it, or relegate to Amazon? (Who, do you think, has a lot more on the line?)

    I mean, if we’re going to be picky, there was a power outage in the Bay Area a couple weeks back, for 4 days in the area I live in. (Hint: near Palo Alto)

    Does that mean, that hey, maybe I should run my own generator 24×7? After all, it’s hard to build software without power!

    No, as much as I didn’t like it, I relegated to PG&E and waited (rather impatiently) for them to roll their truck. (Their customer support lines make Comcast look great, if anyone’s curious.)

    So, the point is, yes, cloud/grid/your-flavor-of-the-week-term computing has some improvements to make, but nothing’s perfect, and come crunch time, I’d rather delegate commodity tasks to the specialists, and focus on tasks that have a high value:time ratio. (Don’t forget, that’s what got this country to the position it’s in today. Anyone here still into making t-shirts?)

    Cheers,
    Your neighborhood failure-server*

    *Only legal process folks are welcome to enjoy this pun

  • Timothy

    I think as an origin solution they function well, but there is no way to guarantee 100% availability.

    Adding additional redundancy to your core is a good way to overcome this challenge, make sure you have a plan in case the origin happens to fail and you can recover.

    Amazon had a problem, this might occur again, but having backup plans in place are safe ways to overcome this.

  • facts striaght

    doody

  • http://snagg.net/blog/?p=15 Whoa. at snagg dev blog

    [...] work on a set (shh you didn’t hear that), I check my feeds and find out that Amazon S3 croaked today, it’s a good thing snagg isn’t launched yet – most of the videos and images would [...]

  • Phil Easter

    Dad, my cell phone broke!

    Last summer my 15 year old greeted me – “Dad! My cell phone is broke and I can’t text my friends!” “mmm.. this is serious. What did you use to do before I bought you the phone?” “I wasn’t able to text back then, dah!!”

    Today’s uproar re: the Amazon S3 outage takes me back to that funny moment when my daughter finally got my point – that I enabled her to enjoy the world of texting. And, like many AWS bloggers today, she did not appreciate that I gave her this gift.

    So, to put a big picture perspective on today’s outage – most of us start ups, if not for AWS, would have burned thru our angel and round A funds to replicate AWS before we would have hit the tipping point and had the luxury of telling our customers that “we are experiencing an outage.”

    Looking back on my old “school days” of expensive networks, users running out of storage and the constant flow of cash to admin staff, I must admit to having a soft spot for the AWS team and service. In those days, a two hour outage was considered an opportunity for our users to chat with the cube neighbor or go down to the cafeteria for a donut. Fast forward to today’s demanding customers and an outage of minutes starts Armageddon. Now, imagine if by some miracle, these customers actually pay for the start up’s service.

    Today, I welcomed the outage as it reinforced my need for AWS. How would my small team respond to an outage? We don’t have the talented staff nor the passion the AWS team has. We forget that Amazon is in the small group of visionary “start-ups” who helped get the net to where we are today.

    Phil Easter
    CTO/AirMe

  • http://quux.tumblr.com Bryan

    Those bemoaning what appears to have been a <5 hour partial outage are missing the point, I think.

    Consider your electric utility. What they do is massive, though arguably less complex than what a datacenter does. There is a lot of redundancy in the electrical grid. And yet outages are not unheard of. When they happen, we gnash our teeth, but we wait it out and then we get back to living our lives.

    It’s a bit unnerving that IT, which has a technology base much younger than that on which our electric utility operates, seems often to be held to a standard higher than the one we hold hold our electric companies to! It is also worth considering that uptime is simply a function of mathematical averages – if my service experiences 2 hours of dowtime during the first month, then my uptime stat looks pretty bad. But if it then undergoes no outages for the next 5 years, my uptime stats are starting to look pretty darn good.

    So folks, remember to put this into perspective. S3′s uptime is pretty good so far: something less than 5 hours of partial downtime in a two year span. Can they do better? Probably. But the world didn’t end today – and I would challenge anyone to find and post examples of better uptime from an equivalent service.

  • Louis-Eric

    There’s a basic issue of good software architecture here; the stability of your software is that of your weakest element, and the elements over which you have no control are the weakest of all. Those need to be bolstered first.

    How many of these companies, for instance, had tools in place to buffer outgoing data before it was sent to S3, with fall-back mechanisms in place if data couldn’t be sent ?

    @8: Which issues did you face ?

  • http://technicle.com Technicle

    Don’t be foolish.. everything has a single point of failure.

  • http://betterexplained.com Kalid

    To get some perspective, 3 nines (99.9%) is 45 mins/month of downtime. 5 nines (99.999%) is only 30 seconds per month.

    Amazon can compensate for 2 hours downtime with 3 months of perfect service and get 99.9%. To get 99.999% they’d need perfect service for over 20 years (http://tinyurl.com/279xkl).

  • http://bizcast.typepad.com Alan Wilensky

    We are only one major outage away from certain marquee clients swearing off sole reliance on SAAS. This happened to a mid-sized automotive auction, a client, that had with my help knit together a network of dealers, contractors, and agents, into a system with a zero install, zero hosting footprint.

    UNTIL:

    There were four accounts that were mashed up…the usual suspects, and one of them went dark. We did some pinging (here is a good business idea for a bright Web20 person, third party app monitoring and governance) and isolated the guilty party.

    In spite of being punked, fingered, whatever, the slacker who ran the service were very rude and unforthcoming. That’s another problem: who are you going to deal with when these hosted services go down? I’m not so sure if it was SalesForce that crapped out, that it would have been better.

    Long and short of it: we have a business community that is used to local control, we consultants want to deliver apps as a service – we will need to ally ourselves with the providers of these services to come up with a game plan…but try and get one of the stars to cough up a retainer!

    Most of the startup SaaS guys laugh when I propose a contract to consult on packaging and policies for reliability for the SMB end users.

    But this is exactly what they should want, guys like me who beat the bushes for them.

  • Ryan McKenzie

    It looks like the “critics” from your previous post on Amazon’s web services were right and you were just saying they were touchy DBAs.

  • http://savvybytes.com/2008/02/16/hyderabad-barcamp-5-live-part-3/ Hyderabad Barcamp 5 – Live ! Part 3 «

    [...] funny thing…the speaker talks about Amazon S3 to save time on performance while S3 is sleeping in woods at the same [...]

  • http://www.tinou.com tinou

    remember when the outage (because of a drunk employee) at 365 main brought down the Internet. Shit happens. Maybe I’m just naive, but if your website is useful/interesting, customers will come back.

  • http://www.buzzpal.com chrisco

    @44 Speaking of Gin ‘n Juice, here’s an blue grass type version of the song on YouTube. Interesting visuals, too: http://www.youtube.com/watch?v=wCAM3C3dpIA

  • http://www.i-guide.ro/blog/Ro/en/ nistor

    http://www.i-guide.ro/blog/Ro/en/
    help me popularize mine

  • http://www.micfo.com micfo.com

    Possibly they need to boost up the authentication service competence at the most.

  • http://www.developeronline.blogspot.com panefsky

    Cloud computing has a long way to go.
    Startups should always have a back-up plan

  • http://amazon.com Ajay Bhutani

    amazon sucks big time

  • Peter Antypas

    Amazon is great for prototyping and “testing the waters”. Economically, it doesn’t make sense for large scale production, especially if your company does something more challenging than serving HTML content.

  • http://www.3tera.com Barry X Lynn

    Cloud computing DOES have a way to go. But that refers to uptake, not ability. We at 3tera are saddened by this black eye. Amazon is truly the pioneer of cloud computing and can continue to be a giant, serving developers of non-mission critical applications cheaply.

    But you can get cloud and utility computing with three or four nines out of the box (and many more with some simple configuration additions), and, have it on completely non-proprietary infrastructure, supporting multi-tier applications, including complex relational databases, by using a hosting provider that offers 3tera’s AppLoogic. Check out our web site. –BXL–

  • Joel

    No offense guys, but plan for failure of critical infrastructure elements. I don’t understand the big deal. Hello? Is any system perfect?

  • http://www.newsmavens.com Brent

    So as big of news as this news was and with all the number of startups affected, did anyone outside the Bay even notice?

  • http://eedious.blogspot.com Alain Yap

    Will get back to sleep … It’ll probably be another 6 months to a year before another outage. And until then, we’d see if AWS has a plan already or another big name player goes to challenge them.

    friarminor
    http:morphexchange.com

  • Bobby delicious

    Amazon seems to be down again.. I guess I will have to look for a new solution

  • Rick
  • Blagovest

    There is a very important question you have to ask yourself before deciding whether to use S3: what are you really looking for – remote storage, content delivery, or both. These are crucial to distinguish.

    What I observe is that most people treat Amazon S3 as a content delivery service. While this is not inherently wrong, one has to notice that S3 was especially designed to be a STORAGE service.

    The point is, since terrabyte hard drives are affordable nowadays and internet traffic grows steadily, the stress goes on content delivery rather than on storage. If you are not concerned about storage, there are much better services especially suited for content delivery.

    SteadyOffload.com provides an innovative, subtle and convenient way to offload static content. The whole mechanism there is quite different from Amazon S3. Instead of permanently uploading your files to a third-party host, their cachebot crawls your site and mirrors the content in a temporary cache on their servers. Content remains stored on your server while it is being delivered from the SteadyOffload cache. The URL of the cached object on their server is dynamically generated at page loading time, very scrambled and is changing often, so you don’t have to worry about hotlinking. This means that there is an almost non-existent chance that the cached content gets exposed outside of your web application.

    It’s definitely worth trying because it’s not a storage service like S3 but exactly a service for offloading static content.

    Watch that:

    http://video.google.com/videoplay?docid=-8193919167634099306 (the video shows integration with WordPress, but it is integrable with any other webpage)

    http://www.steadyoffload.com/

    http://codex.wordpress.org/WordPress_Optimization/Offloading

    Cost of bandwidth comes under $0.2 per GB – affordable, efficient and convenient. Looks like a startup but lures me very much. Definitely simpler and safer than Amazon S3.

  • http://cjaninehodge.com/blog/2008/02/18/amazon-s3-outage-a-gentle-reminder/ C. Janine Hodge » Blog Archive » Amazon S3 Outage: A Gentle Reminder

    [...] Amazon Web Services Goes Down, Takes Many Startup Sites With It [...]

  • http://www.stacksafe.com/blog/index.php/2008/03/04/seven-key-lessons-to-keep-in-mind-when-communicating-an-it-failure/ IT’s About Uptime – The StackSafe Blog » Seven Key Lessons to Keep in Mind When Communicating an IT Failure

    [...] End-to-End IT Services Fail. Critical Applications Fail. DataCenters that we rely upon lose power due to a traffic accident. A system upgrade takes out an airport baggage handling system. A back-end infrastructure provider that many rely upon for outsourced infrastructure experiences a major outage…. [...]

  • http://www.techcrunch.com/2008/04/07/amazon-web-services-gets-another-hiccup/ Amazon Web Services Gets Another Hiccup

    [...] Cloud (EC2) went down for about an hour for at least some customers in the U.S. This follows a major outage of its S3 storage service in February. Companies big and small use EC2 as a virtual data center to [...]

  • http://www.talkibie.com/technology/for-rent-amazons-cloud/ Talkibie » Archive » For rent: Amazon’s cloud

    [...] Web Services has experienced a few outages, most notably this February. As TechCrunch notes, other startups (including Twitter) who rely on S3 or EC2 experienced problems as a result. [...]

  • http://flyrig.wordpress.com/2008/02/16/amazon-web-services-outage/ Saved by the Backup «

    [...] by the Backup Posted on Feb. 16, 2008 by flyrig Yesterday’s outage at Amazon affected some of our images, including the photos for the apartments. Luckily, we store [...]

  • http://www.techcrunch.com/2008/06/06/amazon-down-not-answering-calls/ Amazon Down For An Hour And Counting

    [...] main PR number hasn’t yet been returned. At least their web services appear to be humming. An AWS outage in February took an untold number of startups with it. CrunchBase Information Amazon Information provided [...]

  • http://tipstech.info/2008/06/08/amazon-down-for-an-hour-and-counting-updated/ Amazon Down For An Hour And Counting (Updated) | Tipstech.info

    [...] main PR number hasn’t yet been returned. At least their web services appear to be humming. An AWS outage in February took an untold number of startups with [...]

  • http://elad.blogli.co.il/archives/708 אלעד בבלוגלי » ארכיון הבלוג » Amazon S3 למטה והשמחה גדולה

    [...] אינו זמין ויחד איתו יורדים שרותים רבים אחרים שכן חברות רבות משתמשות בשרותי האחסון של [...]

  • http://torley.com/amazon-s3-service-failure-is-the-harshest-ive-seen-yet Amazon S3 service failure is the harshest I’ve seen yet | Torley Lives

    [...] very reliable and the last time I’ve encountered such a problem (in memory) was on Feb. 15, earlier this year. But today’s outage is worse, going on for several hours now, and it’s also added to [...]

  • http://people.knowledgetree.com/daniel/2008/02/16/amazon-simple-storage-service-outage-some-learnings.html KnowledgeTree Blog » Blog Archive » Amazon Simple Storage Service Outage – Some Learnings

    [...] Amazon’s Simple Storage Service (S3) experienced an outage earlier today, which affected KnowledgeTreeLive and its users. The outage was quite widely reported.  [...]

  • http://www.wildbit.com/blog/2008/11/03/dynamic-rewriting-image-urls-on-aspnet-websites/ Using Amazon S3 to improve performance (using asp.net) | Wildbit

    [...] was. Also imagine that Amazon could go down. I understand, it’s almost impossible but it has happened already. In this way, every time when we place a link to the image or media, we should know which link to [...]

  • http://www.azurejournal.com Alin

    Eric, Amazon and all … the cloud computing needs to be 100% percent reliable. This is the definition of Cloud Computing. Period. Isn’t that the promise of “don’t worry about scalability, performance, we’re handling it for you”? I think we’re getting there though with all the competition … we’ll see …

  • http://veryweblog.com/2009/01/comparing-google-app-engine-amazon-simpledb-and-microsoft-sql-server-data-services/ Comparing Google App Engine, Amazon SimpleDB and Microsoft SQL Server Data Services | veryweblog focus on the internet ,new media.

    [...] Service-Level Agreements: Not specified to date. Amazon S3 has a 99.9% service level guarantee, with payments of 10% of amount due for billing cycle in which the service level was below 99.9% and above 99% and 25% for a service level below 99.9%. AWS suffered a major-scale outage in February 2008. [...]

  • http://seantario.com/2009/01/server-hosting-options-a-rough-guide/ Server Hosting Options – A Rough Guide : Sean Patrick Tario

    [...] scale this to accommodate millions of users, and a BOAT LOAD of marketing dollars behind it. AWS went down just last week in fact, AGAIN, and admits they don’t have everything perfected yet. (Personally I don’t think [...]

  • http://www.saasmania.com/2009/02/04/el-efecto-magnolia/ El efecto Magnolia | Saasmania | Software como servicio – SaaS

    [...] sí, porque el grande puede meter la pata, recordemos que Amazon la ha metido un par de veces  en febrero del 2008 y en Julio del 2008 , y Salesforce la metío el 6 de Enero de este año, y tampoco creo que no [...]

  • http://www.coolestan.com/2009/02/15/in-the-clouds/ In the Clouds… | Coolestan

    [...] Amazon S3 is not new, been around even before I started developing the last iteration of our site (RJ3 as I call it…). However, I never really saw a good fit to use it to serve content. Yes, it offloads some web traffic and you don’t have to worry about losing your data as much. But its latency is high and quite honestly our own hosting solution gets extremely much better traffic results to end users than S3 ever did. Not to mention hesitation that S3 servers will go down. [...]

  • http://news.spotz.com/blogs/front_page_news/archive/2009/02/24/tech-world-grinds-to-a-halt-as-gmail-fails.aspx Tech world grinds to a halt as Gmail fails – Front Page News – NewsSpotz

    [...] the more important it is that we have reliable backups in place. A similar crisis occurred when Amazon Web Services went down almost exactly a year ago; thousands of web-based businesses rely on Amazon for their storage [...]

  • http://news.ippimail.com/2009/02/24/tech-world-grinds-to-a-halt-as-gmail-fails/ ippimail.com » Blog Archive » Tech world grinds to a halt as Gmail fails

    [...] similar crisis occurred when Amazon Web Services went down almost exactly a year ago; thousands of web-based businesses rely on Amazon for their storage [...]

  • http://www.cloudiquity.com/2009/03/what-happens-when-the-cloud-goes-wrong/ What happens when the Cloud goes wrong ? | Cloudiquity

    [...] different levels of  ’going wrong’. We have often publicised outages from the likes of Amazon and Google, but given the publicised SLA’s of each some down time is expected. However [...]

  • http://blog.wsg.net/2009/03/25/what-technology-do-you-use-every-day-part-1-of-3-mobile-applications/ What technology do you use every day? – Part 1 of 3 – Mobile Applications – Blog.WSG.net

    [...]  I don’t remember experiencing an issue with Twitter’s avatars or background during Amazon’s hosting outages last month.   Twitterberry also has a known issue with Twit Pic integration.  Known issue being a euphamisum [...]

  • http://www.commeurope.com/2009/05/17/crolla-google-crolla-la-rete/ .commEurope » Crolla Google, crolla la Rete

    [...] anno fa era successo qualcosa di simile ad Amazon e questo aveva evidenziato la criticità nell’uso di servizi di cloud computing in ambito [...]

  • http://code.google.com/p/cloudstorageapi Cloud Storage API Library

    I’m one of the Co-founders of CloudCamp. We’ve discussed this situation in our breakout sessions many times. One free open source project (that I helped create) might help startups in this situation is the CloudStorageAPI Library. Its a PHP library that allows your website to upload files to both Amazon and Nirvanix. You can add support for other Clouds too. It’s up and running on this website: http://www.woomeover.com, but there is still work to be done (such as better docs). Check it out: http://code.google.com/p/cloudstorageapi – Dave Nielsen

  • http://vinsol.com/blog/2009/02/24/242-10-things-i-learnt-from-gmail-outage/ 24/2 – 14 things I learnt from gmail Outage | Vinsol

    [...] Its the first anniversary of Amazon Web Services Outage. On 15 of February last year. February is not the good month for big [...]

  • http://mubbisherahmed.wordpress.com/2009/08/02/what-is-cloud-computing-its-proscons-and-making-it-work/ What is Cloud Computing, its Pros/Cons and making it work « How IT Works

    [...] Cloud Computing 101: Why Use The Cloud? Amazon Web Services Goes Down, Takes Many Startup Sites With It  Possibly related posts: (automatically generated)Cloud ComputingWeek 7 – E-ProcurementCloud [...]

  • http://vehera.jsn-server7.com/LiddleBlog/?p=560 Liddle Thoughts » Blog Archive » What Happens when the Cloud goes wrong ?

    [...] are different levels of  ’going wrong’. We have often publicised outages from the likes of Amazon and Google, but given the publicised SLA’s of each some down time is expected. However things [...]

  • http://www.spoutingshite.com/2009/11/03/downtime-is-unacceptable/ Spouting Shite » Blog Archive » Downtime is unacceptable

    [...] read stories almost daily about Gmail outages, AWS glitches and even Rackspace downtime. You would be forgiven for believing that these are new issues for the [...]

  • http://revivehumanity.com/?p=47 Amazon Takes A Holiday Vacation, Takes Customers With It « revivehumanity.com

    [...] to host files in the cloud including images and other key content. And it isn’t the first time this has happened (though its competition isn’t much [...]

  • http://www.techgearx.com/amazon-takes-a-holiday-vacation-takes-customers-with-it-update/ Amazon Takes A Holiday Vacation, Takes Customers With It (Update) |

    [...] to host files in the cloud including images and other key content. And it isn’t the first time this has happened (though its competition isn’t much [...]

  • http://www.techcrunch.com/2009/12/23/amazon-down/ Amazon Takes A Holiday Vacation, Takes Customers With It (Update)

    [...] to host files in the cloud including images and other key content. And it isn’t the first time this has happened (though its competition isn’t much [...]

  • Seth

    Holy crap! Part of Twitter isn’t working? 2012 people…it’s coming! lol

  • http://www.tim-wood.net/research/2010/05/disasters-disaster-recovery-in-the-cloud/ TW – Virtualization, Research, Grad School » Disasters & Disaster Recovery in the Cloud

    [...] power outage that brought down servers for about seven hours.  Amazon has experienced a number of outages over the last few years–not surprising given the size of their operations.  However, this [...]

blog comments powered by Disqus
Advertisement
Got a tip? Building a startup? Tell us