this post was submitted on 28 Jul 2023
162 points (94.0% liked)
Technology
59107 readers
3225 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I don't see how that affects my point.
So at any point in time, only recent text could be "contaminated". The claim that "all text after 2023 is forever contaminated" just isn't true. Researchers would simply have to be a bit more careful including it.
Your assertion that a future AI detector will be able to detect current LLM output is dubious. If I give you the sentence "Yesterday I went to the shop and bought some milk and eggs." There is no way for you or any detection system to tell if that was AI generated or not with any significant degree of certainty. What can be done is statistical analysis of large data sets to see how they "smell", but saying around 30% of this dataset is likely LLM generated does not get you very far in creating a training set.
I'm not saying that there is no solution to this problem, but blithely waving away the problem saying future AI will be able to spot old AI is not a serious take.
If you give me several paragraphs instead of a single sentence, do you still think it's impossible to tell?
"If you zoom further out you can definitely tell it's been shopped because you can see more pixels."
There is not enough entropy in text to even detect current model output. it’s game over.
no, they won't. We have already built the models that we have already built. Any current works in progress are the future ai you are talking about. And we just can't do it. Openai themselves have admitted that the ones they tried making just didn't work. And it won't, because language is not just the statistical correlations between words that have already been written in the past.