← Back to context

Comment by kriro

4 years ago

Very interesting comment, thanks for taking the time to write it :)

I think if memory is the only problem than optimizing training time should be more of a concern. I'm imagining a huge language model than can retrain very quickly. So I suppose it might be a decent idea to not measure it by perplexity or some human judgement score or whatever but rather by that score per compute units used.

Or in other words...maybe a bot that scores 90% on the fool a human scale and takes 1 day to compute from scratch is actually a lot less impressive than one that fools 70% but computes from scratch in 5 minutes.

And something "like Github for bot-memory" would be a pretty amazing tool. Roll back to some memory status and recompute with new data from there, branch for different datasets that represent different ways of interpreting the world etc.

Conceptually I like the idea of one "base model" that represents language and many different context models on top of it (finetuning the core model). Then some other subsystem that identifies the context and switches to that. I suppose each conversation could also be considered a mini-dataset.

> I like the idea of one "base model" that represents language and many different context models on top of it (finetuning the core model)

This is an entirely different concept of computer language than the current GPT style models. These systems don't "represent language", and cannot. The whole reason why GPT is so exciting right now is that it fundamentally threw away the entire concept of "representing language". That has some upsides ... and some downsides.