Member-only story
When everything blows up
Production issues will happen. How can you find some order in the chaos?
It’s August. You’re having a decent week. Looking over the top of your monitor, you can see that it’s pretty quiet in the office. Many of the engineers are on their summer vacation since their kids are off school, and they’ve picked a perfect moment to do so: the afternoon sky is a deep azure blue and is barely tainted by a single wisp of cirrus cloud.
Still, you don’t mind being in the office while your colleagues are away. In fact, you don’t mind at all: you get a rare fortnight where you have very little interruption. You’re finding your flow state, you’re getting lots of programming done, and the icing on the proverbial cake is that a warm current of summer breeze is passing by your neck. Bliss.
You see a notification going off on Slack. You switch windows.
It’s the production environment. Uh oh. You minimize your IDE and switch to the Web browser, heading over to the monitoring dashboard. There’s one red dot, which you click. It’s the black box check for the API, which just timed out. That’s weird. You open up a new tab and go to the login page for your application, but it doesn’t load.
Oh dear.