← Back to context

Comment by wongarsu

14 days ago

In the GPT3 days when everyone was doing few-shot tasks (giving the LLM a couple of examples of question/answer pairs in the prompt) one of the big insights was that adding question/answer pairs with answers like "I don't know" and "this question doesn't make sense" caused the model to actually use those answers appropriately instead of overconfidently stating nonsense.

Of course that method isn't perfect (GPT3.0 was far from perfect in general). But both in principle and in practice the models do have a notion of what they "know". Knowledge is a strong activation, random noise is a weaker activation, you "just" have to get the model to override those weaker activations with admitting failure.

You could draw parallels to allowing LLMs to emit pause tokens to get more time to think (https://arxiv.org/abs/2310.02226 and similar). At some level of abstraction that's also just training the model to replace uncertain answers with a special token, in the hope that it eventually reaches more certainty.