Comment by gwern

11 days ago

Not after RLHF tuning, due to the 'flattened logits' phenomenon (which is the logit-level version of the mode collapse OP documents at higher levels). All the temperature settings wind up yielding pretty much the same output, until you ramp it up so high that it falls apart completely. Completely unlike the base models where you can productively tune the temperature or use very high temperatures with some screening.

2 comments

gwern

Imnimo 11 days ago

Hmm, it's hard to check without access to the prompts used in the paper, but I'm skeptical that the distributions seen in e.g. Figure 2 are so different that you would have crank up the temperature very much to bridge the gap. It looks to me like the entries that are 1-in-100 in the base model are just falling off the top-p cliff and getting set to 0.

gwern 11 days ago

Figure 2 (https://arxiv.org/pdf/2406.05587#page=10) is not at the logit level, it's at the whole completion level (entire names classified by nationality).
So you don't know how any sampling would affect that. There could be only a few options at each token, which give rise to that, and higher temperature sampling may shift that around, but it doesn't ever restore the original base model behavior or restore all of the names erased by mode collapse. (Remember, the LLM is an agent, and when you are sampling, it is on-policy because you are letting it make choices of tokens, and it is steering the completion as a whole back to where it wants to be. With mode collapse, all roads lead to Rome, whether you like it or not.)
People do observe that increasing the temperature does not help, eg. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936766/ finds basically no difference going from 0 to 0.9 (!): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10936766/bin/po... Just the flattened logits (https://arxiv.org/pdf/2303.08774#page=12&org=openai) at work.