2020-November-04 Resolved Service Incident
Postmortem

Dates:

Saturday, November 7 12:00 PST - Monday, November 9 11:10 AM PST 

What happened:

Sauce Labs Real Device customers running tests on the Unified Platform in the EU-Central data center experienced intermittent test failures with Sauce Connect. 

Why it happened:

A planned controller failover during a maintenance window caused a traffic pattern to change, resulting in some traffic hitting an Access Control List that did not allow communication back to Sauce Connect tunnels. Although tests were in place to monitor the affected services they did not trigger any production alerts. 

How we fixed it:

The Access Control List was updated to allow this communication flow.

What we are doing to prevent it from happening again:

We are adding additional alerts to our systems, and ensure that they will trigger an alert in case of an outage.
We are reviewing our standards on the implementation of monitoring for new, and existing systems, and adapt our incident training.

Posted Nov 16, 2020 - 22:54 CET

Resolved
Between 12:26pm CET on Nov 3rd and 12:52pm CET on Nov 4th, Sauce Connect tunnels were failing to start from Node.js based test frameworks using the node-saucelabs package. We have identified and fixed the problem. All services are fully operational.
Posted Nov 04, 2020 - 13:05 CET