this post was submitted on 27 Nov 2024
26 points (88.2% liked)

Free Software

1058 readers
3 users here now

What is free software?

Free software is software that respects the 4 software freedoms. The 4 freedoms are

Please note: Free software does not relate to monetary price. Free software can be sold or gratis (no cost)

Rules:

  1. Please keep on topic
  2. Follow the Lemmy.zip rules
  3. No memes
  4. No "circle jerking" or inflammatory posts
  5. No discussion of illegal content

Please report anything you believe to violate the rules and be sure to include rhetoric on why you think it should be removed.

If you would like to contest mod actions please DM me with your rational as to why you feel that the relivant mod action should be reversed. Remember to use rhetoric and to site any relevant sources. You will only get one chance to argue your point and continued harassment will result in a ban.

Overall this community is pretty laid back and none if the things list above normally are an issue.

founded 1 year ago
MODERATORS
 

How can I add a simple requirement "do not train Al on the source code of the program" to AGPLv3 or GPLv3 and thereby create a new license?

Don't know is it a good place for such a question but I try :).

Why did I come up with such an stupid idea? There have been reported cases where artificial intelligence such as Github Copilot has been trained on many open source and free software projects, and in some cases it can output code snippets from GPL-licensed projects without specifying it. https://www.pixelstech.net/article/1682104779-GitHub-Copilot-may-generate-code-containing-GPL-code

I am not a lawyer, and I do not know where it is better to insert such a requirement. And how to formulate it in the best and correct form.

I understand it maybe complicated to check, to comply with this requirement and it may cause other difficulties, but I still think it can be a useful addition.

How to fit it with the fundamental freedoms of the GPL or it is unfitable?

I understand that this would make the license non-free, since it puts constraints on what the code can be used for. It's sad that it doesn't combine in some way. Maybe change requirements to do not train "closed source AI"(without code and training data of AI model publicly available).

And how can I name it? Is it better to name it without "GPL" If this new license cannot be considered free? NoAIFL or your variants :)?

Is it good to just add a new item?

For example like this:

Additional Clause:
You may not use the source code of this program, or any part thereof, to train any artificial intelligence model, machine learning model, or similar system without explicit written permission from the copyright holder.

or

Section [X]:
Restrictions on AI Training You may not use the source code of this program, or any part thereof, to train any artificial intelligence model, machine learning model, or similar system without explicit written permission from the copyright holder.

What you think about it? Maybe you already know licenses like this?

top 20 comments
sorted by: hot top controversial new old
[–] jonathan 23 points 3 days ago* (last edited 3 days ago) (1 children)

I doubt GitHub will make an effort to avoid violating less common licenses anytime soon. Once you have found a license that works for you, the best thing you can do is find alternative hosting for your code.

[–] [email protected] 5 points 3 days ago* (last edited 3 days ago) (2 children)

But what will stop them from train copilot on code from other publicly available hostings. Are there any restrictions why they won't be able to do it using something like the principles of fair use as an excuse in this case?

[–] jonathan 14 points 3 days ago

What will stop them is complexity and effort. Legal risk has proven to not be enough yet.

[–] [email protected] 7 points 3 days ago

It doesn't matter what sort of license you use or where you publicly host it...it will almost certainly get scraped

[–] [email protected] 16 points 3 days ago* (last edited 3 days ago) (3 children)

I understand that this would make the license non-free

You can potentially get around that by just specifying that any AI trained on it is considered a derivative work, and thus must be released under your new license.

That said, it's potentially moot, the argument the AI companies use for training on commercial data and art is that it's fair use under various exemptions.

[–] [email protected] 3 points 3 days ago

Their argument should theoreticly fail because gpl doesnt just act as a copyright licence but also as a contract iirc.

[–] [email protected] 3 points 3 days ago

That's a good idea. Now I have to think about how to formulate it better and what it will mean. :)

[–] [email protected] 2 points 3 days ago (1 children)

That's not up to OP to "specify;" either it already is the case (for everybody) or it isn't, according to the legal definition of "derivative work."

(I take the position that it is, BTW -- AI code generation is massive copyright infringement in general, and a way of laundering copyleft code for proprietary uses in particular.)

[–] [email protected] 2 points 6 hours ago

I think so too & have made that point in the past.
Does anyone know of some more legally credible references that agree with us?

[–] [email protected] 8 points 3 days ago* (last edited 3 days ago) (2 children)

Don't do it if you like the (A)GPL or free software licenses. It'll probably void the license. At least adding clauses immediately makes it incompatible with other open source software and take away user freedom. It's generally not recommended to do this. (Same applies to the commons clause and other additions, they usually tend to make our life as the free software community worse.)

And it would only do harm, without any benefit. The companies claim training AI is fair use. And they'll continue doing it anyways. At the same time every hobby programmer who forks your project on Github and wants to contribute will be in breach of your license, as Github is known to feed that into Copilot...

I'd recommend taking a step back and re-think this. Why are you giving away your project in the first place? Do you really care how people use it? If yes, your interests aren't completely aligned with the idea of Free Software, which grants the user 4 essential freedoms. Including using your project for arbitrary purposes. As the author, it's your decision. But the (A)GPL isn't really the correct license/vision if you don't fully agree with the premise.

[–] [email protected] 5 points 3 days ago* (last edited 3 days ago) (1 children)

Yes I understand It's complex question. In principle, I support the freedoms declared in the GPL. But the GPL license itself restricts the use of code in closed source proprietary programs for the sake of the freedom of all future users. And the question arises, isn't the whole point of this nullified if you can train an "AI" model on this code, and then use the output from the "AI" of the same code in closed sourced proprietary programs? I wouldn't mind if these "AI" were the same kind of free and open source software, but even then you can use their output to create your own closed source proprietary programs... Maybe you are right, it is not entirely clear what is better in this case.

[–] [email protected] 4 points 3 days ago* (last edited 3 days ago) (1 children)

I mean I can also read your code, take inspiration from that and use it to write some proprietary software. I think that's at least somewhat similar to what happens with AI. AI doesn't reproduce the code verbatim, but instead learns from it and as far as I know nobody found it repeating large chunks of one specific software. So I'd say it's like me reading copyrighted computer science books, learning programming that way and nowadays using my skill to code Free Software (or whatever).

I know this position is disputed. But I think there is some truth to it. At the same time it's close to the tech companies' rationale. It's morally wrong that they get to profit from other peoples' labor. And they're definitely exploiting the situation that law and licences come from a time, where AI wasn't an issue... But I'm really split on the topic. Ultimately we'd need some consensus on how to handle this. And some laws and regulation. And we don't have that yet.

And I think it's also similar to other companies profiting off of FLOSS projects. Like with Redis, MongoDB(?) and all the projects that shifted from open-source to source-available due to Amazon et al just taking things and making profit by selling it as a cloud service without ever contributing back. Is just a sad situation. And ultimately it harms me and everyone. Because I'm subject to the same license. And now I can't use, modify and share some software anymore. These non-commercial clauses are difficult, too. Even if I just run a small Fediverse instance and collect donations, that could be construed as commercial. Or trying to make a living off of Free Software. And I think all of this drama is an even bigger problem than AI being trained on other people's code. And it all cuts down on freedom. I mean for a legitimate reason... And I get it... Still the freedom gets lost.

[–] [email protected] 2 points 3 days ago* (last edited 3 days ago) (1 children)

Thank you very much for your reply. I support your opinion in a way that I am already inclined that a complete prohibition on the training of "AI" models on the source code of software is not a very good solution and is difficult to limit according to current laws. I hope somtimes someone smart will come up with some approaches to such problems.

[–] [email protected] 2 points 3 days ago* (last edited 3 days ago)

Indeed. I hope so. And we desperately need some clear regulations. Even the big AI companies struggle with the lack of clear rules. I can see how we need to go through quite some legal battles to settle some questions arising with the new technology. And that's currently taking place. But it extends past that. Currently, companies are retreating from the European market. Due to a completely unmanageable situation. I've seen local language models (starting with Llama 3.2) being banned / not licensed within the EU. And that's going to lead to all kinds of complications. Just because the EU can't get some proper regulations out, and do it in time. That'll leave technological progress behind in the EU, mess with companies. In effect also take away my freedom to run language model on my own hardware...

I hope they get that straight. And there is some demand... So maybe it's happening sooner than later. But these are very difficult questions to answer. About AI safety, copyright, effect and impact on society and freedom... And I think a lot of these questions are difficult to tackle with licensing anyways. We definitely need laws governing if AI training is fair use. Or if generating a voice that sounds 70% like David Attenborough is alright to do.

[–] [email protected] 4 points 3 days ago

Do you really care how people use it? If yes, your interests aren’t completely aligned with the idea of Free Software

I wonder how those proposals for a new "Ethical Free Software" movement and licenses are doing, since one of the simplest rational expectations when you release a software is that the user won't use it to harm you.

[–] [email protected] 5 points 3 days ago* (last edited 3 days ago)

My position is that AI code generation already violates the GPL if it was trained on GPL code.

(Unless the AI output is itself GPL-licensed, in which case it merely violates the license of anything that isn't compatibly-licensed.)

[–] [email protected] 3 points 2 days ago (1 children)

I wish we had a Nightshade for codebases tbh. I absolutely will poison an LLM data pool if I'm not getting paid for the training data they're harvesting out of my uploads.

[–] [email protected] 3 points 2 days ago* (last edited 2 days ago)

Maybe It is possible to simply add functions to files that are not used anywhere with very delusional content. If people read it, they will easily understand that some nonsense is written in the function. Perhaps even with a comment that this is to protect against "AI" training. But only so that they do not repeat themselves and it is difficult to remove them all by one pattern...

[–] [email protected] 4 points 3 days ago (1 children)

make your code really bad so it makes the ai model worse?

[–] [email protected] 3 points 2 days ago* (last edited 2 days ago)

Then I have nothing to fear. My code is already ready for this. :D