this post was submitted on 20 Aug 2024
364 points (92.3% liked)

Technology

59081 readers
3254 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 2 months ago* (last edited 2 months ago) (2 children)

Theoretically we could slow down training and coast on fine-tuning existing models. Once the AI's trained they don't take that much energy to run.

Everyone was racing towards "bigger is better" because it worked up to GPT4, but word on the street is that raw training is giving diminishing returns so the massive spending on compute is just a waste now.

[–] [email protected] 2 points 2 months ago

It's a bit more complicated than that.

New models are sometimes targeting architecture improvements instead of pure size increases. Any truly new model still needs training time, it's just that the training time isn't going up as much as it used to. This means that open weights and open source models can start to catch up to large proprietary models like ChatGPT.

From my understanding GPT 4 is still a huge model and the best performing. The other models are starting to get close though, and can already exceed GPT 3.5 Turbo which was the previous standard to beat and is still what a lot of free chatbots are using. Some of these models are still absolutely huge though, even if not quite as big as GPT 4. For example Goliath is 120 billion parameters. Still pretty chonky and intensive to run even if it's not quite GPT 4 sized. Not that anyone actually knows how big GPT 4 is. Word on the street is it's a MoE model like Mixtral which run faster than a normal model for their size, but again no one outside Open AI actually can say with certainty.

You generally find that Open AI models are larger and slower. Wheras the other models focus more on giving the best performance at a given size as training and using huge models is much more demanding. So far the larger Open AI models have done better, but this could change as open source models see a faster improvement in the techniques they use. You could say open weights models rely on cunning architectures and fine tuning versus Open AI uses brute strength.

[–] [email protected] 2 points 2 months ago

Issue is, we're reaching the limits of what GPT technologies can do, so we have to retrain them for the new ones, and currently available data have been already poisoned by AI generated garbage, which will make the adaptation of new technologies harder.