this post was submitted on 01 Jul 2023
900 points (97.3% liked)
Technology
59677 readers
3201 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
How true is the LLM data scraping threat?
Who knows. One thing for sure is that this is just one more thing he pulled out of his ass without any backing
It's not. He's probably lying to save face and just forgot to pay his bills.
Meta has shown that getting huge amounts of training data can lead to great results with a model that's much simpler than what openAI uses and it looks like they are taking a more open approach to LLMs because of that. Twitter has shitloads of possible training data, but it's Twitter so that data isn't great.
Elon is known to be afraid of AGIs becoming hostile, so that explains the decision.
I don't think it'll slow down AI development too much. There are new Llama-based models coming out every month that are better than the previous ones.
Reddit is a much better source of data and if they don't want to lose SEO, their data can still be gathered by scraping even after the API changes take effect.