← Back to context

Comment by gwern

4 years ago

You are missing the point of the paper about few-shot learning. That's the entire paper: just doing new untrained task after task. The entire point of the paper is that you can 'reprogram' GPT-3 to do just about anything just by stuffing its context with examples, and it'll pick up brandnew entities or words or concepts just by examples (see the examples of defining novel gibberish words and asking GPT-3 to use them in a sentence - it does so. it "learned" new words by reading the examples, understanding, and propagating them through the 'fast weights' of self-attention, even though its 'slow weights' are fixed). Now, if GPT-3 can do that already so well, sometimes hitting SOTA on untrained tasks purely by internal meta-learning without changing its weights, what would a 10-trillion parameter model do? Or one with recurrency like XL or Compressive? How much training do you really need if the few-shot learning capabilities are so great you can make it do countless tasks just by providing examples or descriptions in the prompt.