[EU-WEST-2] Incident network degradation

Incident Report for 3DSOutscale Status page

Postmortem

From 14h05 to 19h50 UTC on April 7th, 2025, the eu-west 2 region experienced a high packet loss level causing virtual machines to be unreachable. During the incident, differents investigations had been made before isolating the machine at the origine of the incident.

A virtual machine, using a specific broadcast protocol, initiated a flood that was exacerbated by the VXLAN multicast mechanism, which propagated the traffic across all Data Centers in the region across virtual machine dedicated network. This combination of factors overwhelmed the network’s design capacity specifically on this broadcast protocol, saturating some specific internal physical interfaces on many hypervisors and causing widespread packet drops (approximately 50%).
As a result, all virtual machines attached to impacted hypervisors were unable to communicate, rendering approximately half of the virtual machines in the three (3) European Availability Zones unreachable.

After securing our Zones, two projects were undertaken: the integration of multicast limitation on virtual switches in a future version of our TINA orchestrator (planed June 2025), and the tightening of « storm control »

Posted May 27, 2025 - 11:15 CEST

Resolved

The incident was resolved on 7 April at 9.50pm Paris time.

We have continued to monitor the incident for 48 hours.

We will keep you informed of the results of our analysis in a future communication.
Posted Apr 09, 2025 - 12:12 CEST

Update

We are continuing to monitor for any further issues.
Posted Apr 08, 2025 - 09:36 CEST

Monitoring

We have identified a component that could be related to the ongoing incident.
Our teams are closely monitoring it to determine whether it is indeed the root cause.
We will keep you informed as soon as we have more information.
Posted Apr 07, 2025 - 21:58 CEST

Update

The service is currently experiencing partial outages affecting virtual machine connectivity. All new deployments on the current impacted region will encounter the same issue but our other regions remain unaffected. Our Network engineers are actively investigating the root cause and will provide updates as soon as possible.
Posted Apr 07, 2025 - 20:47 CEST

Update

Degradation is still in progress and some cloud resources are not reachable. All of our teams are mobilized to support you during this incident and we will continue posting updates as we are actively working on the resolution of this issue. Support team might contact you if specific action are required from your teams.
Posted Apr 07, 2025 - 19:31 CEST

Update

The service is degraded and impact access to most cloud ressources. Our teams investigate several potentiel sources of network degradation and will keep you updated regurlaly.
Posted Apr 07, 2025 - 18:38 CEST

Update

We are continuing to investigate this issue.
Posted Apr 07, 2025 - 17:34 CEST

Update

The service is degraded and impact access to most cloud ressources.
Our teams are mobilized to restore the service as soon as possible and will keep you updated regularly.
Impacted services: Public Network, OOS, Cockpit
Posted Apr 07, 2025 - 17:33 CEST

Update

Our teams are currently continuing the investigation regarding the network issue.
Analysis is ongoing to identify the root cause and restore the situation as quickly as possible.
We will keep you informed as soon as we have updates or a resolution.
Posted Apr 07, 2025 - 17:06 CEST

Investigating

We are currently investigating this issue.
Posted Apr 07, 2025 - 16:24 CEST
This incident affected: eu-west-2 (Outscale Object storage (OOS), Public network, Cockpit).