How to Write a Useful Incident Update
Your service is down. Your customers know it. The only thing they want right now is information: what happened, what you are doing about it, and when it will be fixed. Most teams fail at this — not because they do not care, but because writing under pressure is hard.
The Four-Stage Lifecycle
Every incident should move through four stages. Each stage has a different communication goal.
1. Investigating
You know something is wrong, but you do not yet know why.
We are aware of issues affecting [service name] and are currently investigating. We will provide an update within 30 minutes.
Goal: acknowledge the problem immediately. Customers need to know you are aware. Do not wait until you have a root cause — that can take hours.
2. Identified
You know what is causing the issue.
The issue has been identified as [brief, non-technical description]. Our team is working on a fix. Next update in 30 minutes.
Goal: reduce uncertainty. Once customers know the cause, they can estimate impact on their own workflows.
3. Monitoring
A fix has been deployed, but you are watching to confirm it holds.
A fix has been deployed for [issue]. We are monitoring the situation to confirm stability. We will resolve this incident if no further issues arise in the next 60 minutes.
Goal: signal progress without declaring victory too early. Premature "resolved" updates that get reopened erode trust.
4. Resolved
The issue is confirmed fixed.
This incident has been resolved. [Service] is operating normally. Total downtime: approximately [duration]. We will publish a post-mortem within [timeframe].
Goal: close the loop. Include duration and a post-mortem commitment if the incident was significant.
Timing Matters More Than Detail
The single biggest mistake in incident communication is silence. An update every 30 minutes — even if it says "still investigating, no new information" — is vastly better than one detailed update two hours later.
Set a timer. When it goes off, post an update regardless of whether anything has changed.
What to Avoid
- Blame — "Our hosting provider caused this" may be true, but it sounds like deflection. Own the customer impact.
- Over-promising — "This will be fixed in 10 minutes" creates expectations you may not meet. Use ranges or "we will update in X minutes" instead.
- Jargon — "The k8s pod OOMKilled due to a memory leak in the GC" means nothing to your customers. "Our application servers ran out of memory and we are restarting them" does.
Automate What You Can
The hardest part of incident communication is remembering to do it at 3 AM when you are focused on fixing the problem. Automatic status page updates — triggered when a monitor detects a failure — eliminate the communication gap between detection and acknowledgement.
With Upwarden, your status page updates within seconds of a detected outage, and subscribers are notified by email automatically. Your team can focus on fixing the issue while customers stay informed.
Get Started
If your current incident process relies entirely on manual updates, try Upwarden free and see how automated status updates change your incident workflow.