It’s getting pretty scary out there for service providers and users alike, with the number of attacks on electronic devices and services appearing to be increasing exponentially.
Microsoft’s cloud services appear to be a particular target, with the most recent 2-hour outage being blamed on an anomalous surge” of DNS queries from all over the world that was targeting certain domains hosted on Azure.
The outage prevented users from accessing or signing into numerous services, including Xbox Live, Microsoft Office, SharePoint Online, Microsoft Intune, Dynamics 365, Microsoft Teams, Skype, Exchange Online, OneDrive, Yammer, Power BI, Power Apps, OneNote, Microsoft Managed Desktop, and Microsoft Streams.
Microsoft did not reveal who was responsible for the attack and normally even a concerted DDOS attack would not be able to bring down a massive cloud service such as Azure.
The attack, unfortunately, revealed a flaw in how the company implemented their DNS Edge caches.
“Azure DNS servers experienced an anomalous surge in DNS queries from across the globe targeting a set of domains hosted on Azure. Normally, Azure’s layers of caches and traffic shaping would mitigate this surge. In this incident, one specific sequence of events exposed a code defect in our DNS service that reduced the efficiency of our DNS Edge caches,” Microsoft explained in the root cause analysis for the outage.
“As our DNS service became overloaded, DNS clients began frequent retries of their requests which added workload to the DNS service. Since client retries are considered legitimate DNS traffic, this traffic was not dropped by our volumetric spike mitigation systems. This increase in traffic led to decreased availability of our DNS service.”
Microsoft has since fixed the defect and the DNS caches should now be able to handle spikes in traffic better. Microsoft also plans to improve the monitoring and mitigations of anomalous traffic.
The process of finding and fixing flaws should mean the service becomes more hardened over time, but this is little compensation for clients who have their productivity and data security impacted. Those however who may feel tempted to host their own services should be chastened by the recent Hafnium exploits, which targetted exactly such self-hosted services, which overall tends to have a worse security record.
The full RCA can be read here.