From 14h05 to 19h50 UTC on April 7th, 2025, the eu-west 2 region experienced a high packet loss level causing virtual machines to be unreachable. During the incident, differents investigations had been made before isolating the machine at the origine of the incident.
A virtual machine, using a specific broadcast protocol, initiated a flood that was exacerbated by the VXLAN multicast mechanism, which propagated the traffic across all Data Centers in the region across virtual machine dedicated network. This combination of factors overwhelmed the network’s design capacity specifically on this broadcast protocol, saturating some specific internal physical interfaces on many hypervisors and causing widespread packet drops (approximately 50%).
As a result, all virtual machines attached to impacted hypervisors were unable to communicate, rendering approximately half of the virtual machines in the three (3) European Availability Zones unreachable.
After securing our Zones, two projects were undertaken: the integration of multicast limitation on virtual switches in a future version of our TINA orchestrator (planed June 2025), and the tightening of « storm control »