this post was submitted on 01 Jul 2023
1002 points (96.5% liked)
Technology
59600 readers
4059 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I didn't read the article. But could it be that every platform is trying to limit LLMs to be trained on their data?
Definitely one reason, but this is also to push premium
Seems like the free internet as we knew it is dead. Any site with free, user-generated content to monetize Is about to try and suck every last dime from it.
The internet as we knew it wasn’t free. We were the product. Here’s hoping their drive to force payment sends us on to decentralized, open source infrastructure.
Fortunately, we have the user-owned distributed internet to move to.
Yep, capitalist greed ruins everything. Which is why distributed networks run by the community are our best hope for the future.
Those fuckers would try to ruin this too, by bot attacks, by trying to cut deals with some of the admins or by running their own versions of Lemmy/Mastadon.
We as a community will have to handle whatever comes next.
We now have the federated interne though, and I think it's got a way brighter future.
Honestly a paid internet is better. Just look at the Fediverse. Internet was never profitable. Now the data collection just needs to stop
They were always going to do that, the squeeze is basically required if you're planning on making a public offering and become beholden to investors.
Also they haven't paid Google for using their Cloud so they are moving their data.
No you misunderstand they desperately want them to be trained with their data. They just want them to pay hundreds of thousands to millions of dollars to do so. Twitter is not buckling under the weight of data scraping, Elon is just pissed that companies are data scraping instead paying his exorbitant API fees.
This is the hilarious part to me: some companies might pay these fees, but there will be many more who won’t and will instead use actual web scrapers to get their data anyways. As the number of individuals training LLM models increases in the next couple of years, this will create a much more significant traffic load compared to API calls.
Yeah he doesn’t seem to understand he’s not selling the data, the data is public, he’s selling convenience. And if the convenience isn’t worth the price you’ve set, people will just take the extra effort and avoid the expense.
Exactly. I do selenium scripting as my main task for work, and as soon as I heard about how high the api rates were my first through was "Jesus, it might slower than straight api calls, and the dynamic xpaths might suck, but I could write a script that scrapes the website for cheaper." Twitter is hurting for cash right now, and I imagine his effort to raise funds is the end goal here. He instituted the api policy, learned about another side effect, and continues to with the most extreme, devoid of nuance response each time.
All "in my opinion," of course.
Hmm. Sounds a lot like something /u/spez said. I wouldn’t expect Twitter to be a good LLM source with its current state anyway…Reddit would be a lot better contextually. The reality is Reddit and Twitter are bleeding cash and they’ve got brain-rotted CEOs that don’t pay their bills or have unrealistic plans and timelines for profitability.
Most of Twitter's (and soon Reddit's) data to be fed to LLMs will be porn sharing bots at this point.
Seems like a strange way to enforce it, at the user level vs the api client level, unless they're trying to guard against screen scraper types.
It's all fun and games till they train the AIs to make a million small accounts.