GitHub Availability Report: May 2024


In May, GitHub experienced an incident causing significant performance degradation across its services for over 7 hours, due to a configuration change in an upstream cloud provider. Issues included latency in the GitHub Copilot Chat, delays in workflow run updates in GitHub Actions and longer migration run times for GitHub Enterprise Importer customers. Billing metrics were also delayed, but no data was lost and normal service was restored after mitigation. GitHub found the problem was caused by a scheduled operating system upgrade, which led to an uneven distribution of traffic. They increased the numbers of network routes between their data centres and cloud provider to mitigate the incident, and are taking steps to improve their load threshold monitoring and alerting to prevent a recurrence.

read full post