Glossary

Monitoring & reliability terms explained

Plain-language definitions for the concepts behind uptime monitoring, incident management, and status pages.

DORA (Digital Operational Resilience Act)

EU regulation requiring financial entities to manage ICT risks including incident management and reporting.

Any period when a system or service is unavailable or not functioning correctly.

EU regulation governing the processing and protection of personal data.

A push-based monitoring approach where a service reports its own health by sending periodic signals to a monitoring endpoint.

Checking the availability and response of web endpoints by sending HTTP requests.

The end-to-end lifecycle of handling service incidents from detection to resolution.

The process of detecting, responding to, and resolving service disruptions.

The average time between when a failure occurs and when it's detected.

The average time it takes to restore a service after a failure is detected.

EU directive requiring essential and important entities to implement cybersecurity and incident reporting measures.

Checking whether a host is reachable on the network using ICMP echo requests.

The maximum acceptable amount of data loss measured in time after a disruption.

The maximum acceptable time to restore a service after a disruption.

A formal agreement defining the expected level of service, typically including uptime guarantees.

A measurable metric used to evaluate whether a service meets its SLO or SLA.

An internal target for service reliability, typically stricter than the external SLA.

Tracking SSL/TLS certificate validity and expiration to prevent certificate-related outages.

A public web page that communicates the current operational status of a service to users.

Checking whether a service is accepting connections on a specific network port.

The percentage of time a system or service is operational and accessible.

A way of expressing service availability, where each additional '9' represents dramatically less allowed downtime.

Checking whether a WebSocket endpoint accepts connections by verifying that the WS/WSS handshake succeeds.