Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Resilience Testing at Amazon, Etsy, and Google

CowboyRobot (671517) writes | more than 2 years ago

Google 0

CowboyRobot writes "Kripa Krishnan and Tom Limoncelli at Google have a detailed look into Google's GameDay resiliance exercise, what they call DiRT (Disaster Recovery Testing) and in related pieces, Etsy's John Allspaw makes the case for resilience testing, and the three continue with a roundtable discussion with Amazon's Jesse Robbins on lessons learned from these kinds of exercises.

Among other insights and anecdotes, "We simulated a long-term power outage at a data center. This test challenged the facility to run on backup generator power for an extended period, which in turn required the purchase of considerable amounts of diesel fuel without access to the usual chain of approvers at HQ. We expected someone in the facility to invoke our documented emergency spending process, but since they didn't know where that was, the test takers creatively found an employee who offered to put the entire six-digit charge on his personal credit card. Copious documentation on how something should work doesn't mean anyone will use it, or that it will work if they do. The only way to make sure is through testing.."

Link to Original Source

Sorry! There are no comments related to the filter you selected.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?