Once you learn how LLMs work (plenty links explaining it), it gets fairly obvious why LLMs fail the reversal: even if they can get some simple logic through the tokens alone, the opposition between "parent" and "child" is semantic, but LLMs do not handle semantics, they only handle the tokens themselves.
The part that interest me the most on this is this footnote:
² Bing uses some neurosymbolic supplementation to pure LLM’s [...]
I'm often babbling about LLMs handling tokens instead of concepts, and how you need to handle concepts to actually model language, but it seems that at least Microsoft is working its way into that. I wouldn't be surprised if OpenAI, Alphabet/Google and Meta/Faecesbook weren't doing the same to "fix" ChatGPT, Bard and LLaMa.
Eventually I think that some better model will pop up, where this "neurosymbolism" (I like to call it a "conceptual" layer - basically semantics with a sprinkle of pragmatics) is the core of the model, with token handling mostly to interface with the user. If they get this right it'll be rather obvious early on, because:
- Even with considerably less training (think on orders of magnitude less), the model will perform comparably to a LLM.
- Hallucinations will become less and less self-contradictory. They'll still appear though, and the bot will be damn stubborn about it. (Made up example: the bot believes that pineapples grow from trees, and it'll consistently refer to them as such.)
- The model will be able to handle those reversals as soon as instructed about the relationships.