this post was submitted on 28 Oct 2024
33 points (100.0% liked)

TechTakes

1438 readers
46 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

With the OSI publishing their abysmal - explicitly not open source - "Open Source AI" definition I thought I'd post my argument, why it is bad and why "Open Source AI" currently probably does not exist.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 23 points 1 month ago (7 children)

The stretching is just so blatant. People who train neural networks do not write a bunch of tokens and weights. They take a corpus of training data and run a training program to generate the weights. That's why it is the training program and the corpus that should be considered the source form of the program. If either of these can't be made available in a way that allows redistribution of verbatim and modified versions, it can't be open source. Even if I have a powerful server farm and a list of data sources for Llama 3, I can't replicate the model myself without committing copyright infringement (neither could Facebook for that matter, and that's not an entirely separate issue).

There are large collections of freely licensed and public domain media that could theoretically be used to train a model, but that model surely wouldn't be as big as the proprietary ones. In some sense truly open source AI does exist and has for a long time, but that's not the exciting thing OSI is lusting after, is it?

[–] [email protected] 7 points 1 month ago (1 children)

Yeah, neural network training is notoriously easy to reproduce /s.

Just few things can affect results: source data, data labels, network structure, training parameters, version of training script, versions of libraries, seed for random number generator, hardware, operating system.

Also, deployment is another can of worms.

Also, even if you have open source script, data and labels, there's no guarantee you'll have useful documentation for either of these.

[–] [email protected] 3 points 1 month ago (1 children)

Yes, that just reiterates my point, doesn't it?

[–] [email protected] 5 points 1 month ago (1 children)

It was supposed to. I'm just not that good at writing.

[–] [email protected] 5 points 1 month ago

Fair enough. Sorry for being rude about it.

load more comments (5 replies)