← Back to context

Comment by gwern

11 days ago

Not after RLHF tuning, due to the 'flattened logits' phenomenon (which is the logit-level version of the mode collapse OP documents at higher levels). All the temperature settings wind up yielding pretty much the same output, until you ramp it up so high that it falls apart completely. Completely unlike the base models where you can productively tune the temperature or use very high temperatures with some screening.

Hmm, it's hard to check without access to the prompts used in the paper, but I'm skeptical that the distributions seen in e.g. Figure 2 are so different that you would have crank up the temperature very much to bridge the gap. It looks to me like the entries that are 1-in-100 in the base model are just falling off the top-p cliff and getting set to 0.