← Back to context

Comment by SirMaster

14 days ago

Oh, well that's kind of what I mean. I mean I assume the RLHF that's being done isn't teaching it to say "I don't know".

Which I wonder if it's intentional. Because a fairly big complaint about the systems are how they can sometimes sound confidently correct about something they don't know. And so why train them to be like this if that's an intentional training direction.

The point of the above commenter (and mine) is that they hallucinate even more without RLHF. RLHF reduces hallucinations, but they are still there anyway.

Hopefully some rlhf-using companies will realize saying "I don't know" is important and start instructing the humans giving feedback to prefer answers that say I don't know over wrong answers.