Skip to main content

Meta's Llama 3.1 Can Recall 42% of the First Harry Potter Book

3 months ago
Timothy B. Lee has written for the Washington Post, Vox.com, and Ars Technica — and now writes a Substack blog called "Understanding AI." This week he visits recent research by computer scientists and legal scholars from Stanford, Cornell, and West Virginia University that found that Llama 3.1 70BÂ(released in July 2024) has memorized 42% of the first Harry Potter book well enough to reproduce 50-token excerpts at least half the time... The paper was published last month by a team of computer scientists and legal scholars from Stanford, Cornell, and West Virginia University. They studied whether five popular open-weight models — three from Meta and one each from Microsoft and EleutherAI — were able to reproduce text from Books3, a collection of books that is widely used to train LLMs. Many of the books are still under copyright... Llama 3.1 70B — a mid-sized model Meta released in July 2024 — is far more likely to reproduce Harry Potter text than any of the other four models.... Interestingly, Llama 1 65B, a similar-sized model released in February 2023, had memorized only 4.4 percent of Harry Potter and the Sorcerer's Stone. This suggests that despite the potential legal liability, Meta did not do much to prevent memorization as it trained Llama 3. At least for this book, the problem got much worse between Llama 1 and Llama 3. Harry Potter and the Sorcerer's Stone was one of dozens of books tested by the researchers. They found that Llama 3.1 70B was far more likely to reproduce popular books — such as The Hobbit and George Orwell's 1984 — than obscure ones. And for most books, Llama 3.1 70B memorized more than any of the other models... For AI industry critics, the big takeaway is that — at least for some models and some books — memorization is not a fringe phenomenon. On the other hand, the study only found significant memorization of a few popular books. For example, the researchers found that Llama 3.1 70B only memorized 0.13 percent of Sandman Slim, a 2009 novel by author Richard Kadrey. That's a tiny fraction of the 42 percent figure for Harry Potter... To certify a class of plaintiffs, a court must find that the plaintiffs are in largely similar legal and factual situations. Divergent results like these could cast doubt on whether it makes sense to lump J.K. Rowling, Richard Kadrey, and thousands of other authors together in a single mass lawsuit. And that could work in Meta's favor, since most authors lack the resources to file individual lawsuits. Why is it happening? "Maybe Meta had trouble finding 15 trillion distinct tokens, so it trained on the Books3 dataset multiple times. Or maybe Meta added third-party sources — such as online Harry Potter fan forums, consumer book reviews, or student book reports — that included quotes from Harry Potter and other popular books..." "Or there could be another explanation entirely. Maybe Meta made subtle changes in its training recipe that accidentally worsened the memorization problem."

Read more of this story at Slashdot.

EditorDavid

Dems demand audit of CVE program as Federal funding remains uncertain

3 months ago
PLUS: Discord invite links may not be safe; Miscreants find new way to hide malicious JavaScript; and more!

Infosec In Brief  A pair of Congressional Democrats have demanded a review of the Common Vulnerabilities and Exposures (CVE) program amid uncertainties about continued US government funding for the scheme.…

Brandon Vigliarolo

Apple Migrates Its Password Monitoring Service to Swift from Java, Gains 40% Performance Uplift

3 months ago
Meta and AWS have used Rust, and Netflix uses Go,reports the programming news site InfoQ. But using another language, Apple recently "migrated its global Password Monitoring service from Java to Swift, achieving a 40% increase in throughput, and significantly reducing memory usage." This freed up nearly 50% of their previously allocated Kubernetes capacity, according to the article, and even "improved startup time, and simplified concurrency." In a recent post, Apple engineers detailed how the rewrite helped the service scale to billions of requests per day while improving responsiveness and maintainability... "Swift allowed us to write smaller, less verbose, and more expressive codebases (close to 85% reduction in lines of code) that are highly readable while prioritizing safety and efficiency." Apple's Password Monitoring service, part of the broader Password app's ecosystem, is responsible for securely checking whether a user's saved credentials have appeared in known data breaches, without revealing any private information to Apple. It handles billions of requests daily, performing cryptographic comparisons using privacy-preserving protocols. This workload demands high computational throughput, tight latency bounds, and elastic scaling across regions... Apple's previous Java implementation struggled to meet the service's growing performance and scalability needs. Garbage collection caused unpredictable pause times under load, degrading latency consistency. Startup overhead — from JVM initialization, class loading, and just-in-time compilation, slowed the system's ability to scale in real time. Additionally, the service's memory footprint, often reaching tens of gigabytes per instance, reduced infrastructure efficiency and raised operational costs. Originally developed as a client-side language for Apple platforms, Swift has since expanded into server-side use cases.... Swift's deterministic memory management, based on reference counting rather than garbage collection (GC), eliminated latency spikes caused by GC pauses. This consistency proved critical for a low-latency system at scale. After tuning, Apple reported sub-millisecond 99.9th percentile latencies and a dramatic drop in memory usage: Swift instances consumed hundreds of megabytes, compared to tens of gigabytes with Java. "While this isn't a sign that Java and similar languages are in decline," concludes InfoQ's article, "there is growing evidence that at the uppermost end of performance requirements, some are finding that general-purpose runtimes no longer suffice."

Read more of this story at Slashdot.

EditorDavid