← Back to context

Comment by Imnimo

12 days ago

>T ∈ (0, 1] is a parameter called temperature which controls the “softness” of the probability distribution. In our experiments we choose T = 1.0 for maximum response variation.

Why is temperature bounded to be <=1? If you want more "creativity" out of the chat model, can you just set T higher and recover a similar distribution to the base model?

Not after RLHF tuning, due to the 'flattened logits' phenomenon (which is the logit-level version of the mode collapse OP documents at higher levels). All the temperature settings wind up yielding pretty much the same output, until you ramp it up so high that it falls apart completely. Completely unlike the base models where you can productively tune the temperature or use very high temperatures with some screening.

  • Hmm, it's hard to check without access to the prompts used in the paper, but I'm skeptical that the distributions seen in e.g. Figure 2 are so different that you would have crank up the temperature very much to bridge the gap. It looks to me like the entries that are 1-in-100 in the base model are just falling off the top-p cliff and getting set to 0.

They'll tell you "No" and say that you ruin your samplers, but good samplers (dynamic ones) like min_p or typicality are robust to high temperatures, so in actuality yes.

  • Cite? I don't see how either of those could deal with the fact that the logits become uninformative and 'flattened' after the tuning. How can a sampler undo the erasure of information?