Instability for Customer Production Environments

Incident Report for Appfarm AS

Resolved

On Monday, Jan 8, 20:04, some of the Appfarm Customer solutions running on a specific Kubernetes node became unstable. At 21:05, the node was back to a healthy state.

The root cause of the downtime was attributed to a Kubernetes pod that excessively consumed system memory, consequently rendering the entire node unhealthy.

Changes we have done to prevent this from happening again:

- Identified and addressed the specific pods causing memory exhaustion.
- Implemented measures to prevent pods from overwhelming a node.

We sincerely apologize for any inconvenience this may have caused.

Posted Jan 08, 2024 - 20:00 CET