Here’s a situation that’ll feel familiar: you just fixed a huge problem in your tech stack. Maybe you closed the blackhole that your leads were falling into, or you fixed the segmentation rule that accidentally fired off an email to your entire opt-out list. Your adrenaline is up and you need to communicate what just happened to your leadership.
In situations like these, ops staff in Marketing and Sales could take inspiration from engineering teams. For them, incident response and post-incident communication are nothing new: over the past decade, engineers have adapted and evolved post-mortems as a way to capture and communicate details about incidents that occur in production, and more importantly, learn from them. Google wrote a whole book on it, but the concepts are simple and applicable:
First Lay the Groundwork
An important part of integrating post-mortems into your workflow is having the right culture for it in the first place. Without a culture of continuous learning and an inherent belief that everyone on your team is well-intentioned and acting in good faith, your post-mortems could easily slip into finger pointing and blaming exercises.
Sure, many problems you encounter in your tech stack could ultimately be tied to an action taken by a person, but if you focus too heavily on the person, you’re almost certainly missing an opportunity to learn why it was possible for him to make that mistake in the first place. Even worse, a culture of blaming and shaming could discourage people from reporting incidents at all.
When an AWS outage in 2017 took down a huge swath of the internet, they didn’t fire the engineer who mistyped a single command, they fixed the system that allowed a single typo to cause such a major outage.
Then Implement the Process
With the right culture in place, it just takes a little bit of process to start learning. A meeting with a simple agenda will do: shortly after the issue is resolved, get everyone together who helped respond and discuss and document the following:
- What was the impact (the lead or customer’s qualitative experience during the issue)?
- What was the scale (how many leads or customers were impacted, for how long)?
- What were the events that led to the issue (changes in your tech stack, changes to how data flows through it, etc)?
- What actions did you take to fix it?
- What actions will you take to prevent it in the future (changes to process, access, and investment in any tools to help)?
This simple post-mortem meeting gets you exactly the details you need to communicate to your leadership and the rest of the team; all that’s left is to package it up. You can download a template here.