Skip to main content

LLMs' 'Simulated Reasoning' Abilities Are a 'Brittle Mirage,' Researchers Find

2 months 3 weeks ago
An anonymous reader quotes a report from Ars Technica: In recent months, the AI industry has started moving toward so-called simulated reasoning models that use a "chain of thought" process to work through tricky problems in multiple logical steps. At the same time, recent research has cast doubt on whether those models have even a basic understanding of general logical concepts or an accurate grasp of their own "thought process." Similar research shows that these "reasoning" models can often produce incoherent, logically unsound answers when questions include irrelevant clauses or deviate even slightly from common templates found in their training data. In a recent pre-print paper, researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text." To pull on that thread, the researchers created a carefully controlled LLM environment in an attempt to measure just how well chain-of-thought reasoning works when presented with "out of domain" logical problems that don't match the specific logical patterns found in their training data. The results suggest that the seemingly large performance leaps made by chain-of-thought models are "largely a brittle mirage" that "become[s] fragile and prone to failure even under moderate distribution shifts," the researchers write. "Rather than demonstrating a true understanding of text, CoT reasoning under task transformations appears to reflect a replication of patterns learned during training." [...] Rather than showing the capability for generalized logical inference, these chain-of-thought models are "a sophisticated form of structured pattern matching" that "degrades significantly" when pushed even slightly outside of its training distribution, the researchers write. Further, the ability of these models to generate "fluent nonsense" creates "a false aura of dependability" that does not stand up to a careful audit. As such, the researchers warn heavily against "equating [chain-of-thought]-style output with human thinking" especially in "high-stakes domains like medicine, finance, or legal analysis." Current tests and benchmarks should prioritize tasks that fall outside of any training set to probe for these kinds of errors, while future models will need to move beyond "surface-level pattern recognition to exhibit deeper inferential competence," they write.

Read more of this story at Slashdot.

BeauHD

Jellyfish Swarm Forces French Nuclear Plant To Shut

2 months 3 weeks ago
AmiMoJo shares a report from the BBC: A French nuclear plant temporarily shut down on Monday due to a "massive and unpredictable presence of jellyfish" in its filters, its operator said. The swarm clogged up the cooling system and caused four units at the Gravelines nuclear power plant to automatically switch off, energy group EDF said. The plant is cooled from a canal connected to the North Sea -- where several species of jellyfish are native and can be seen around the coast when the waters are warm. According to nuclear engineer Ronan Tanguy, the marine animals managed to slip through systems designed to keep them out because of their "gelatinous" bodies. "They were able to evade the first set of filters then get caught in the secondary drum system," he told the BBC. Mr Tanguy, who works at the WNA, said this will have created a blockage which reduced the amount of water being drawn in, prompting the units to shut down automatically as a precaution. He stressed that the incident was a "non-nuclear event" and more a "nuisance" for the on-site team to clean up. For local people, there would be no impact on their safety or how much energy they could access: "They wouldn't perceive it as any different to any other shut-down of the system for maintenance."

Read more of this story at Slashdot.

BeauHD