← Back to context

Comment by freehorse

12 days ago

LLMs do not know what "they know" or they don't. They just autocomplete what sounds best relevant based on their training set. They do not have enough "I don't know" in their training set in the first place most probably.To have them say "I don't know" you have to go into finetuning them heavily. So, if anything, they hallucinate a lot more without RLHF. Which in this paper they call "creativity".

In the GPT3 days when everyone was doing few-shot tasks (giving the LLM a couple of examples of question/answer pairs in the prompt) one of the big insights was that adding question/answer pairs with answers like "I don't know" and "this question doesn't make sense" caused the model to actually use those answers appropriately instead of overconfidently stating nonsense.

Of course that method isn't perfect (GPT3.0 was far from perfect in general). But both in principle and in practice the models do have a notion of what they "know". Knowledge is a strong activation, random noise is a weaker activation, you "just" have to get the model to override those weaker activations with admitting failure.

You could draw parallels to allowing LLMs to emit pause tokens to get more time to think (https://arxiv.org/abs/2310.02226 and similar). At some level of abstraction that's also just training the model to replace uncertain answers with a special token, in the hope that it eventually reaches more certainty.