← Back to context

Comment by nottorp

12 days ago

> uncensored and tuned exactly for that purpose

Are they tuning too, or just removing all restrictions they can get at?

Because my worry isn't that I can't generate porn, but that censorship will mess up all the answers. This study seems to say the latter.

Usually "uncensored" models have been made by instruction tuning a model from scratch (i.e. starting from a pretrained-only model) on a dataset which doesn't contain refusals, so it's hard to compare directly to a "censored" model - it's a whole different thing, not an "uncensored" version of one.

More recently a technique called "orthogonal activation steering" aka "abliteration" has emerged which claims to edit refusals out of a model without affecting it otherwise. But I don't know how well that works, it's only been around for a few weeks.

  • I've seen some of the "abliterated" models flat-out refuse to write novels, other times they just choose to skip certain plot elements. Non-commercial LLMs seem to be hit or miss... (Is that a good thing? I don't know, I just screw around with them in my spare time)

    I'll try command-r though, it wasn't on my list to try because it didn't suggest what it was good at.