It doesn't matter how good your team is. It doesn't matter how good your documentation is. It doesn't matter how well you've planned your deployment or how much you've spent on your hardware. In fact, no amount of preparation can avoid the fact.
There will be disaster.
Your data will be corrupted. Your infrastructure will fail you. Someone else's infrastructure will fail you. Your database will die. Your code will crash. Choose at least one. More likely, it will be all of them, at one time or another, and probably some other things as well. Of course, all that preparation can avoid disasters and help you recover from them faster when they do happen, but you won't avoid them completely.
What matters is how you handle things when the disaster happens. And really, there are only two things that are important.
Tell people. Stay calm.
There are people other than you being affected by your disaster, and you should let them know that something isn't quite right, and you should do it as soon as possible. It's incredibly frustrating to have shit going wrong and have no idea why, so those folks are much less likely to begin to hate you if you talk to them. Not only will they be less inclined to hate you, they'll also be much less likely to try to contact you, which will give you more time to focus on finding out what is actually wrong and fix it.
I'll admit I've been guilty of thinking, "I'll just work out what's happening, then I can tell X" but it is a Bad Idea. Tell them something is wrong immediately, tell them you've worked out what it is and you're making a plan to fix it, tell them you're working on it, tell them you're testing the fix, tell them it's fixed. Tell them at least enough to keep them happy and keep them off your back.
Keeping your calm when the shit hits the fan let's you focus on working out what's going wrong and making a plan to fix it. Use the scientific method; gather data, develop a theory, test your theory, evaluate. Repeat. That's really hard to do when you're the opposite of calm. Jumping from incomplete data to half-arsed theory isn't going to get the problem solved quicker. Worse, if the theory looks promising and you push things out too fast without proper evaluation, you risk making a disaster into … a double-decker disaster. Or something.
Staying calm can also help let people know that, despite the fact that there is currently a disaster playing out, you have things under control and everything will be back on track shortly. Saying, "Fuck. Fuck fuck fuck fuck fuck fuck fuck," while you're on the phone with the client is not the right way to project the sense that you have things under control.1 In fact, if you can't stop yourself from an outburst, I'd recommend not mixing it with the tell people step. Don't raise your voice at your team mates, blame is not going to help anyone, especially in the middle of things.
Of course, you may need to smash some stuff or get really drunk afterwards, but right now, you have to keep your head.
There will be disaster. Stay calm. Tell people.
- Guilty, I'm sad to say. ↩