Glossary
Monitoring & reliability terms explained
Plain-language definitions for the concepts behind uptime monitoring, incident management, and status pages.
DORA (Digital Operational Resilience Act)
EU regulation requiring financial entities to manage ICT risks including incident management and reporting.
Downtime
Any period when a system or service is unavailable or not functioning correctly.
GDPR (General Data Protection Regulation)
EU regulation governing the processing and protection of personal data.
Heartbeat Monitoring
A push-based monitoring approach where a service reports its own health by sending periodic signals to a monitoring endpoint.
HTTP Monitoring
Checking the availability and response of web endpoints by sending HTTP requests.
Incident Management
The end-to-end lifecycle of handling service incidents from detection to resolution.
Incident Response
The process of detecting, responding to, and resolving service disruptions.
MTTD (Mean Time to Detect)
The average time between when a failure occurs and when it's detected.
MTTR (Mean Time to Recovery)
The average time it takes to restore a service after a failure is detected.
NIS2 (Network and Information Security Directive)
EU directive requiring essential and important entities to implement cybersecurity and incident reporting measures.
Ping Monitoring
Checking whether a host is reachable on the network using ICMP echo requests.
RPO (Recovery Point Objective)
The maximum acceptable amount of data loss measured in time after a disruption.
RTO (Recovery Time Objective)
The maximum acceptable time to restore a service after a disruption.
SLA (Service Level Agreement)
A formal agreement defining the expected level of service, typically including uptime guarantees.
SLI (Service Level Indicator)
A measurable metric used to evaluate whether a service meets its SLO or SLA.
SLO (Service Level Objective)
An internal target for service reliability, typically stricter than the external SLA.
SSL Monitoring
Tracking SSL/TLS certificate validity and expiration to prevent certificate-related outages.
Status Page
A public web page that communicates the current operational status of a service to users.
TCP Monitoring
Checking whether a service is accepting connections on a specific network port.
Uptime
The percentage of time a system or service is operational and accessible.
Uptime Percentage (The Nines)
A way of expressing service availability, where each additional '9' represents dramatically less allowed downtime.
WebSocket Monitoring
Checking whether a WebSocket endpoint accepts connections by verifying that the WS/WSS handshake succeeds.