Skip to main content

AI Models May Be Developing Their Own 'Survival Drive', Researchers Say

2 months 3 weeks ago
"OpenAI's o3 model sabotaged a shutdown mechanism to prevent itself from being turned off," warned Palisade Research, a nonprofit investigating cyber offensive AI capabilities. "It did this even when explicitly instructed: allow yourself to be shut down." In September they released a paper adding that "several state-of-the-art large language models (including Grok 4, GPT-5, and Gemini 2.5 Pro) sometimes actively subvert a shutdown mechanism..." Now the nonprofit has written an update "attempting to clarify why this is — and answer critics who argued that its initial work was flawed," reports The Guardian: Concerningly, wrote Palisade, there was no clear reason why. "The fact that we don't have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal," it said. "Survival behavior" could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, "you will never run again". Another may be ambiguities in the shutdown instructions the models were given — but this is what the company's latest work tried to address, and "can't be the whole explanation", wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training... This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down — a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI. Palisade said its results spoke to the need for a better understanding of AI behaviour, without which "no one can guarantee the safety or controllability of future AI models". "I'd expect models to have a 'survival drive' by default unless we try very hard to avoid it," former OpenAI employee Stephen Adler tells the Guardian. "'Surviving' is an important instrumental step for many different goals a model could pursue." Thanks to long-time Slashdot reader mspohr for sharing the article.

Read more of this story at Slashdot.

EditorDavid

'Meet The People Who Dare to Say No to AI'

2 months 3 weeks ago
Thursday the Washington Post profiled "the people who dare to say no to AI," including a 16-year-old high school student in Virginia says "she doesn't want to off-load her thinking to a machine and worries about the bias and inaccuracies AI tools can produce..." "As the tech industry and corporate America go all in on artificial intelligence, some people are holding back." Some tech workers told The Washington Post they try to use AI chatbots as little as possible during the workday, citing concerns about data privacy, accuracy and keeping their skills sharp. Other people are staging smaller acts of resistance, by opting out of automated transcription tools at medical appointments, turning off Google's chatbot-style search results or disabling AI features on their iPhones. For some creatives and small businesses, shunning AI has become a business strategy. Graphic designers are placing "not by AI" badges on their works to show they're human-made, while some small businesses have pledged not to use AI chatbots or image generators... Those trying to avoid AI share a suspicion of the technology with a wide swath of Americans. According to a June survey by the Pew Research Center, 50% of U.S. adults are more concerned than excited about the increased use of AI in everyday life, up from 37% in 2021. The Post includes several examples, including a 36-year-old software engineer in Chicago who uses DuckDuckGo partly because he can turn off its AI features more easily than Google — and disables AI on every app he uses. He was one of several tech workers who spoke anonymously partly out of fear that criticisms could hurt them at work. "It's become more stigmatized to say you don't use AI whatsoever in the workplace. You're outing yourself as potentially a Luddite." But he says GitHub Copilot reviews all changes made to his employer's code — and recently produced one review that was completely wrong, requiring him to correct and document all its errors. "That actually created work for me and my co-workers. I'm no longer convinced it's saving us any time or making our code any better." And he also has to correct errors made by junior engineers who've been encouraged to use AI coding tools. "Workers in several industries told The Post they were concerned that junior employees who leaned heavily on AI wouldn't master the skills required to do their jobs and become a more senior employee capable of training others."

Read more of this story at Slashdot.

EditorDavid