Zum gestrigen Vorfall, bei dem Gmail aufgrund massiver Störungen bei vielen Nutzern Mails nur verzögert ausliefern konnte und Anhänge nicht geladen werden konnten, hat Google nun Stellung bezogen.
Demnach kam es zu einem Ausfall gleicher zweier Netzwerke welche dann im Zusammenspiel dafür sorgten, dass es zu den erlebten Problemen kam. Aufgrund der automatischen Überwachung aller Netzwerke und Komponenten konnten die Google Engineers schon Minuten nach dem Ausfall mit Ihrem Reparatureinsatz beginnen und erste Teile der betroffenen Netzwerke wiederherstellen.
Google gibt an, dass 71% aller Mails normal zugestellt wurden. Bei den verbleibenden 29% lag die durchschnittliche Verzögerung bei 2,6 Sekunden. 1,5% aller Mails wurden um 2 Stunden oder mehr verzögert.
Google gibt zudem an, dass man nun dafür sorgen will, solche Ausfälle künftig zu vermeiden. Man arbeitet bereits daran, auch bei Ausfällen künftig die Zustellung aller Mails dennoch gewährleisten zu können.
On September 24th, many Gmail users received an unwelcome surprise: some of their messages were arriving slowly, and some of their attachments were unavailable. We’d like to start by apologizing—we realize that our users rely on Gmail to be always available and always fast, and for several hours we didn’t deliver. We have analyzed what happened, and we’ll tell you about it below. In addition, we’re taking several steps to prevent a recurrence.
The message delivery delays were triggered by a dual network failure. This is a very rare event in which two separate, redundant network paths both stop working at the same time. The two network failures were unrelated, but in combination they reduced Gmail’s capacity to deliver messages to users, and beginning at 5:54 a.m. PST messages started piling up. Google’s automated monitoring alerted the Gmail engineering team within minutes, and they began investigating immediately. Together with the networking team, the Gmail team restored some of the network capacity that was lost and worked to repurpose additional capacity, clearing much of accumulated message backlog by 1:00 p.m. PST and the remainder by shortly before 4:00 p.m. PST.
The impact on users’ Gmail experience varied widely. Most messages were unaffected—71% of messages had no delay, and of the remaining 29%, the average delivery delay was just 2.6 seconds. However, about 1.5% of messages were delayed more than two hours. Users who attempted to download large attachments on affected messages encountered errors. Throughout the event, Gmail remained otherwise available — users could log in, read messages which had been delivered, send mail, and access other features.
What’s next? Our top priority is ensuring that Gmail users get the experience they expect: fast, highly-available email, anytime they want it. We’re taking steps to ensure that there is sufficient network capacity, including backup capacity for Gmail, even in the event of a rare dual network failure. We also plan to make changes to make Gmail message delivery more resilient to a network capacity shortfall in the unlikely event that one occurs in the future. Finally, we’re updating our internal practices so that we can more quickly and effectively respond to network issues. We’ll be working on all of these improvements and more over the next few weeks—even including this event, Gmail remains well above 99.9% available, and we intend to keep it that way!