Skip to main content

ChatGPT Became So Obsessed With Goblins That OpenAI Had to Intervene

1 month ago
The Wall Street Journal reports that OpenAI "recently gave its popular ChatGPT strict instructions. Stop talking about goblins." Recent models of the artificial-intelligence chatbot have been bringing up the creatures in conversations with users seemingly out of the blue, as well as gremlins, trolls and ogres. The goblin-speak caught the attention of programmers, who are often heavy users of the bot. Barron Roth, a 32-year-old product manager at a tech company, said the bot referred to a flaw in his code as a "classic little goblin." He said he counted more than 20 times it mentioned goblins, without any prompting... Several users speculated that goblin terminology was how the model characterized itself, in lieu of identifying as a person with a soul. Then OpenAI decided enough was enough. "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query," reads an open source line in ChatGPT's base instructions for its coding assistant. The Journal calls this "a reminder that even as AI companies tout one advance after another in their technology, they are sometimes baffled by the things their own models do...." While training a "nerdy" personality for their model's customization feature, "We unknowingly gave particularly high rewards for metaphors with creatures," OpenAI explained in a log post. And "From there, the goblins spread." When we looked, use of "goblin" in ChatGPT had risen by 175% after the launch of GPT-5.1, while "gremlin" had risen by 52%... With GPT-5.4, we and our usersâ noticed an even bigger uptick in references to these creatures... Nerdy accounted for only 2.5% of all ChatGPT responses, but 66.7% of all "goblin" mentions in ChatGPT responses... The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data. It all started because the "nerdy" personality's prompt had said "You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed..." Now OpenAI calls this "a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones." But "fans of goblins don't have to fear," notes the Wall Street Journal. "OpenAI provided a command in its blog post that would remove its creature-suppressing instructions."

Read more of this story at Slashdot.

EditorDavid

South Africa's Draft AI Policy Withdrawn Due to 'Fictitious' AI-Generated Citations

1 month ago
An official in South Africa withdrew a draft of the country's national AI policy, reports a local newspaper, "after it was found the draft policy was compiled using AI, which cited academic articles that were 'fictitious'." Earlier this month, minister in the Presidency Khumbudzo Ntshavheni announced cabinet had approved the draft policy for public comment. [Ntshavheni] said the policy seeks to strengthen government's ability to regulate and adopt AI responsibly, while fostering innovation, job creation, and skills access. The article includes this quotes from the country's minister of communications/digital technologies department. "This unacceptable lapse proves why vigilant human oversight over the use of artificial intelligence is critical." Thanks to Slashdot reader Tokolosh for sharing the article.

Read more of this story at Slashdot.

EditorDavid