← Back to context

Comment by SirMaster

16 days ago

So is the reason why LLMs don't say when they don't know something and instead make up something that "sounds right" because the RLHF has taught it to always give an answer?

And if that's the case, why? Is that really what people want an LLM to do? I feel like I would rather it say when it doesn't know something.

LLMs do not know what "they know" or they don't. They just autocomplete what sounds best relevant based on their training set. They do not have enough "I don't know" in their training set in the first place most probably.To have them say "I don't know" you have to go into finetuning them heavily. So, if anything, they hallucinate a lot more without RLHF. Which in this paper they call "creativity".

  • In the GPT3 days when everyone was doing few-shot tasks (giving the LLM a couple of examples of question/answer pairs in the prompt) one of the big insights was that adding question/answer pairs with answers like "I don't know" and "this question doesn't make sense" caused the model to actually use those answers appropriately instead of overconfidently stating nonsense.

    Of course that method isn't perfect (GPT3.0 was far from perfect in general). But both in principle and in practice the models do have a notion of what they "know". Knowledge is a strong activation, random noise is a weaker activation, you "just" have to get the model to override those weaker activations with admitting failure.

    You could draw parallels to allowing LLMs to emit pause tokens to get more time to think (https://arxiv.org/abs/2310.02226 and similar). At some level of abstraction that's also just training the model to replace uncertain answers with a special token, in the hope that it eventually reaches more certainty.

It's the other way around. RLHF is needed for the model to say "I don't know".

  • Oh, well that's kind of what I mean. I mean I assume the RLHF that's being done isn't teaching it to say "I don't know".

    Which I wonder if it's intentional. Because a fairly big complaint about the systems are how they can sometimes sound confidently correct about something they don't know. And so why train them to be like this if that's an intentional training direction.

    • The point of the above commenter (and mine) is that they hallucinate even more without RLHF. RLHF reduces hallucinations, but they are still there anyway.

    • Hopefully some rlhf-using companies will realize saying "I don't know" is important and start instructing the humans giving feedback to prefer answers that say I don't know over wrong answers.

All the chat LLMs have a non zero temperature which means they can be looser with the truth or more creative.