this post was submitted on 09 Jul 2023
505 points (97.2% liked)

Technology

59600 readers
3365 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Two authors sued OpenAI, accusing the company of violating copyright law. They say OpenAI used their work to train ChatGPT without their consent.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 9 points 1 year ago (1 children)

Scraping the web is legal and training AI on data is also legal.

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (2 children)

Reusing the content you scraped, if copyright protected, is not.

Edit: unless you get the authorization of the original authors but OpenAI didn't even asked, that's why it's a crime.

[–] [email protected] 12 points 1 year ago (1 children)
[–] [email protected] 1 points 1 year ago

That really will be the question at hand. Is the ai producing work that could be considered transformative, educational, or parody? The answer is of course yes, it is capable of doing all three of those things, but it's also capable of being coaxed into reproducing things exactly.

I don't know if current copyright laws are capable of dealing with the ai Renaissance.

[–] [email protected] 3 points 1 year ago

Yeah it is. The only protection in copyright is called derivative works, and an AI is not a derivative of a book, No more than your brain is after you've read one.

The only exception would be if you manage to overtrain and encode the contents of the book inside of the model file. That's not what happened here because I'll chat GPT output was a summary.

The only valid claim here is the fact that the books were not supposed to be on the public internet and it's likely that the way open AI the books in the first place was through some piracy website through scraping the web.

At that point you just have to hold them liable for that act of piracy, not the fact that the model release was an act of copyright violation.