119
submitted 1 week ago by [email protected] to c/[email protected]
you are viewing a single comment's thread
view the rest of the comments
[-] [email protected] -4 points 1 week ago* (last edited 1 week ago)

I don't think that's an impossible problem. Existing models can reliability distinguish between, for example, different languages. Most of their training data is presumably in English but while this may make them better at generating English text, it doesn't make them randomly switch from other languages to English. A sufficiently advanced model would likewise distinguish between descriptions of reality and shit-posts because the content of shit-posts would not be useful for predicting descriptions of reality. Some fine tuning would teach it to produce just the descriptions of reality.

Or look at it this way: the folks developing these LLMs aren't ignorant of the fact that Reddit content is often false and meant to be funny. They're not going to make the sort of silly mistake that someone who isn't an expert can still easily predict and they're not going to train their LLMs on that content if it makes the LLMs worse, although we're still going to see some glue on pizza while the technology continues to develop.

[-] [email protected] 6 points 1 week ago

It can cross check a language with tons of other words and examples of that language already in its data set. There is no such data for whether or not something confirms with reality. That simply doesn't exist and really won't ever exist. They are not similar problems. One is immensely more challenging to solve than the other.

this post was submitted on 04 Jul 2024
119 points (93.4% liked)

Fuck AI

909 readers
354 users here now

A place for all those who loathe machine-learning to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 4 months ago
MODERATORS