Skip to main content

Cloudflare Raves About Performance Gains After Rust Rewrite

3 months 1 week ago
"We've spent the last year rebuilding major components of our system," Cloudflare announced this week, "and we've just slashed the latency of traffic passing through our network for millions of our customers," (There's a 10ms cut in the median time to respond, plus a 25% performance boost as measured by CDN performance tests.) They replaced a 15-year-old system named FL (where they run security and performance features), and "At the same time, we've made our system more secure, and we've reduced the time it takes for us to build and release new products." And yes, Rust was involved: We write a lot of Rust, and we've gotten pretty good at it... We built FL2 in Rust, on Oxy [Cloudflare's Rust-based next generation proxy framework], and built a strict module framework to structure all the logic in FL2... Built in Rust, [Oxy] eliminates entire classes of bugs that plagued our Nginx/LuaJIT-based FL1, like memory safety issues and data races, while delivering C-level performance. At Cloudflare's scale, those guarantees aren't nice-to-haves, they're essential. Every microsecond saved per request translates into tangible improvements in user experience, and every crash or edge case avoided keeps the Internet running smoothly. Rust's strict compile-time guarantees also pair perfectly with FL2's modular architecture, where we enforce clear contracts between product modules and their inputs and outputs... It's a big enough distraction from shipping products to customers to rebuild product logic in Rust. Asking all our teams to maintain two versions of their product logic, and reimplement every change a second time until we finished our migration was too much. So, we implemented a layer in our old NGINX and OpenResty based FL which allowed the new modules to be run. Instead of maintaining a parallel implementation, teams could implement their logic in Rust, and replace their old Lua logic with that, without waiting for the full replacement of the old system. Over 100 engineers worked on FL2 — and there was extensive testing, plus a fallback-to-FL1 procedure. But "We started running customer traffic through FL2 early in 2025, and have been progressively increasing the amount of traffic served throughout the year...." As we described at the start of this post, FL2 is substantially faster than FL1. The biggest reason for this is simply that FL2 performs less work [thanks to filters controlling whether modules need to run]... Another huge reason for better performance is that FL2 is a single codebase, implemented in a performance focussed language. In comparison, FL1 was based on NGINX (which is written in C), combined with LuaJIT (Lua, and C interface layers), and also contained plenty of Rust modules. In FL1, we spent a lot of time and memory converting data from the representation needed by one language, to the representation needed by another. As a result, our internal measures show that FL2 uses less than half the CPU of FL1, and much less than half the memory. That's a huge bonus — we can spend the CPU on delivering more and more features for our customers! Using our own tools and independent benchmarks like CDNPerf, we measured the impact of FL2 as we rolled it out across the network. The results are clear: websites are responding 10 ms faster at the median, a 25% performance boost. FL2 is also more secure by design than FL1. No software system is perfect, but the Rust language brings us huge benefits over LuaJIT. Rust has strong compile-time memory checks and a type system that avoids large classes of errors. Combine that with our rigid module system, and we can make most changes with high confidence... We have long followed a policy that any unexplained crash of our systems needs to be investigated as a high priority. We won't be relaxing that policy, though the main cause of novel crashes in FL2 so far has been due to hardware failure. The massively reduced rates of such crashes will give us time to do a good job of such investigations. We're spending the rest of 2025 completing the migration from FL1 to FL2, and will turn off FL1 in early 2026. We're already seeing the benefits in terms of customer performance and speed of development, and we're looking forward to giving these to all our customers. After that, when everything is modular, in Rust and tested and scaled, we can really start to optimize...! Thanks to long-time Slashdot reader Beeftopia for sharing the article.

Read more of this story at Slashdot.

EditorDavid

Researchers Consider The Advantages of 'Swarm Robotics'

3 months 1 week ago
The Wall Street Journal looks at swarm robotics, where no single robot is in charge, robots interact only with nearby robots — and the swarm accomplishes complex tasks through simple interactions. "Researchers say this approach could excel where traditional robots fail, like situations where central control is impractical or impossible due to distance, scale or communication barriers." For instance, a swarm of drones might one day monitor vast areas to detect early-stage wildfires that current monitoring systems sometimes miss... A human operator might set parameters like where to search, but the drones would independently share information like which areas have been searched, adjust search patterns based on wind and other weather data from other drones in the swarm, and converge for more complete coverage of a particular area when one detects smoke. In another potential application, a swarm of robots could make deliveries across wide areas more efficient by alerting each other to changing traffic conditions or redistributing packages among themselves if one breaks down. Robot swarms could also manage agricultural operations in places without reliable internet service. And disaster-response teams see potential for swarms in hurricane and tsunami zones where communication infrastructure has been destroyed. At the microscopic scale, researchers are developing tiny robots that could work together to navigate the human body to deliver medication or clear blockages without surgery... In recent demonstrations, teams of tiny magnetic robots — each about the size of a grain of sand — cleared blockages in artificial blood vessels by forming chains to push through the obstructions. The robots navigate individually through blood vessels to reach a clog, guided by doctors or technicians using magnetic fields to steer them, says researcher J.J. Wie, a professor of organic and nano engineering at Hanyang University in South Korea. When they reach an obstruction, the robots coordinate with each other to team up and break through. Wie's group is developing versions of these robots that biodegrade after use, eliminating the need for surgical removal, and coatings that make the robots compatible with human tissue. And while robots the size of sand grains work for some applications, Wie says that they will need to be shrunk to nano scale to cross biological barriers, such as cell membranes, or bind to specific molecular targets, like surface proteins or receptors on cancer cells. Some researchers are even exploring emergent intelligence — "when simple machines, following only a few local cues, begin to organize and act as if they share a mind...beyond human-designed coordination." Thanks to long-time Slashdot reader fjo3 for sharing the article.

Read more of this story at Slashdot.

EditorDavid