Skip to main content

BlueSky Proposes 'New Standard' for When Scraping Data for AI Training

3 months 2 weeks ago
An anonymous reader shared this article from TechCrunch: Social network Bluesky recently published a proposal on GitHub outlining new options it could give users to indicate whether they want their posts and data to be scraped for things like generative AI training and public archiving. CEO Jay Graber discussed the proposal earlier this week, while on-stage at South by Southwest, but it attracted fresh attention on Friday night, after she posted about it on Bluesky. Some users reacted with alarm to the company's plans, which they saw as a reversal of Bluesky's previous insistence that it won't sell user data to advertisers and won't train AI on user posts.... Graber replied that generative AI companies are "already scraping public data from across the web," including from Bluesky, since "everything on Bluesky is public like a website is public." So she said Bluesky is trying to create a "new standard" to govern that scraping, similar to the robots.txt file that websites use to communicate their permissions to web crawlers... If a user indicates that they don't want their data used to train generative AI, the proposal says, "Companies and research teams building AI training sets are expected to respect this intent when they see it, either when scraping websites, or doing bulk transfers using the protocol itself." Over on Threads someone had a different wish for our AI-enabled future. "I want to be able to conversationally chat to my feed algorithm. To be able to explain to it the types of content I want to see, and what I don't want to see. I want this to be an ongoing conversation as it refines what it shows me, or my interests change." "Yeah I want this too," posted top Instagram/Threads executive Adam Mosseri, who said he'd talked about the idea with VC Sam Lessin. "There's a ways to go before we can do this at scale, but I think it'll happen eventually."

Read more of this story at Slashdot.

EditorDavid

Too Many Red Flags

3 months 2 weeks ago

Fresh out of university, Remco accepted a job that allowed him to relocate to a different country. While entering the workforce for the first time, he was also adjusting to a new home and culture, which is probably why the red flags didn't look quite so red.

The trouble had actually begun during his interview. While being questioned about his own abilities, Remco learned about Conglomcorp's healthy financial position, backed by a large list of clients. Everything seemed perfect, but Remco had a bad gut feeling he could neither explain nor shake off. Being young and desperate for a job, he ignored his misgivings and accepted the position. He hadn't yet learned how scarily accurate intuition often proves to be.

The second red flag was run up the mast at orientation. While teaching him about the company's history, one of the senior managers proudly mentioned that Conglomcorp had recently fired 50% of their workforce, and were still doing great. This left Remco feeling more concerned than impressed, but he couldn't reverse course now.

Flag number three waved during onboarding, as Remco began to learn about the Java application he would be helping to develop. He'd been sitting at the cubicle of Lars, a senior developer, watching over his shoulder as Lars familiarized him with the application's UI.

"Garbage Collection." Using his mouse, Lars circled a button in the interface labeled just that. "We added this to solve a bug some users were experiencing. Now we just tell everyone that if they notice any weird behavior in the application, they should click this button."

Remco frowned. "What happens in the code when you click that?"

"It calls System.gc()."

But that wasn't even guaranteed to run! The Java virtual machine handled its own garbage collection. And in no universe did you want to put a worse-than-useless button in your UI and manipulate clients into thinking it did something. But Remco didn't feel confident enough to speak his mind. He kept silent and soldiered on.

When Remco was granted access to the codebase, it got worse. The whole thing was a pile of spaghetti full of similar design brillance that mostly worked well enough to satisfy clients, although there was a host of bugs in the bug tracker, some of which had been rotting there for over 7 years. Remco had been given the unenviable task of fixing the oldest ones.

Remco slogged through another few months. Eventually, he was tasked with implementing a new feature that was supposed to be similar to existing features already in the application. He checked these other features to see how they were coded, intending to follow the same pattern. As it turned out, they had all been implemented in a different, weird way. The wheel had been reinvented over and over, each time by someone who'd never even heard of a circle. None of the implementations looked like anything he ought to be imitating.

Flummoxed, Remco approached Lars' cubicle and explained his findings. "How should I proceed?" he finally asked.

Lars shrugged, and looked up from a running instance of the application. "I don't know." Lars turned back to his screen and pushed "Garbage Collect".

Fairly soon after that enlightening experience, Remco moved on. Conglomcorp is still going, though whether they've retained their garbage collection button is anyone's guess.

[Advertisement] Picking up NuGet is easy. Getting good at it takes time. Download our guide to learn the best practice of NuGet for the Enterprise.
Ellis Morning