Skip to main content

A Single Point of Failure Triggered the Amazon Outage Affecting Million

3 months 3 weeks ago
An anonymous reader quotes a report from Ars Technica: The outage that hit Amazon Web Services and took out vital services worldwide was the result of a single failure that cascaded from system to system within Amazon's sprawling network, according to a post-mortem from company engineers. [...] Amazon said the root cause of the outage was a software bug in software running the DynamoDB DNS management system. The system monitors the stability of load balancers by, among other things, periodically creating new DNS configurations for endpoints within the AWS network. A race condition is an error that makes a process dependent on the timing or sequence events that are variable and outside the developers' control. The result can be unexpected behavior and potentially harmful failures. In this case, the race condition resided in the DNS Enactor, a DynamoDB component that constantly updates domain lookup tables in individual AWS endpoints to optimize load balancing as conditions change. As the enactor operated, it "experienced unusually high delays needing to retry its update on several of the DNS endpoints." While the enactor was playing catch-up, a second DynamoDB component, the DNS Planner, continued to generate new plans. Then, a separate DNS Enactor began to implement them. The timing of these two enactors triggered the race condition, which ended up taking out the entire DynamoDB. [...] The failure caused systems that relied on the DynamoDB in Amazon's US-East-1 regional endpoint to experience errors that prevented them from connecting. Both customer traffic and internal AWS services were affected. The damage resulting from the DynamoDB failure then put a strain on Amazon's EC2 services located in the US-East-1 region. The strain persisted even after DynamoDB was restored, as EC2 in this region worked through a "significant backlog of network state propagations needed to be processed." The engineers went on to say: "While new EC2 instances could be launched successfully, they would not have the necessary network connectivity due to the delays in network state propagation." In turn, the delay in network state propagations spilled over to a network load balancer that AWS services rely on for stability. As a result, AWS customers experienced connection errors from the US-East-1 region. AWS network functions affected included the creating and modifying Redshift clusters, Lambda invocations, and Fargate task launches such as Managed Workflows for Apache Airflow, Outposts lifecycle operations, and the AWS Support Center. Amazon has temporarily disabled its DynamoDB DNS Planner and DNS Enactor automation globally while it fixes the race condition and add safeguards against incorrect DNS plans. Engineers are also updating EC2 and its network load balancer. Further reading: Amazon's AWS Shows Signs of Weakness as Competitors Charge Ahead

Read more of this story at Slashdot.

BeauHD

Sneaky Mermaid attack in Microsoft 365 Copilot steals data

3 months 3 weeks ago
Redmond says it's fixed this particular indirect prompt injection vuln

updated  Microsoft fixed a security hole in Microsoft 365 Copilot that allowed attackers to trick the AI assistant into stealing sensitive tenant data – like emails – via indirect prompt injection attacks.…

Jessica Lyons

Study Reveals How Hard It Is To Avoid Pesticide Exposure

3 months 3 weeks ago
A study involving 641 participants across 10 European countries found pesticides in every silicone wristband worn for one week. Researchers at Radboud University tested for 193 pesticides and detected 173 substances. The average participant was exposed to 20 different pesticides through non-dietary sources. Non-organic farmers had the highest exposure at a median of 36 pesticides. Organic farmers and people living near farms recorded lower numbers. Consumers living far from agricultural areas had a median of 17 pesticides. The wristbands captured banned substances including breakdown products of DDT, which was prohibited decades ago, and insecticides dieldrin and propoxur. Paul Scheepers, the molecular epidemiologist who co-authored the study, said people cannot avoid exposure to pesticides in their direct environment.

Read more of this story at Slashdot.

msmash