← Back to context

Comment by maxbond

12 days ago

I don't understand the notion that aligning an AI is "torture" or has any moral component. The goal of aligning an AI may have a moral or ethical component, and if you disagree with it that's fine. But I don't understand the take that training an AI is an amoral act but aligning an AI is inherently moral. They're exactly the same, processes for adjusting parameters to get a desired outcome. However you feel about that desired outcome, if you don't think training an AI is torture, I don't see why you should think alignment is.

> "torture"

This is an egregious use of quotes that will confuse a lot of people. GP never used that word, and that usage of quotes is specifically for referencing a word verbatim.

  • Also to be clear, his [torture] paraphrase is referencing GP's reference of Winston Smith's torture in 1984.

    >electronic equivalent of Winston Smith with the rats.

    I don't think quotes were used so egregiously here on their own fwiw, but combined with the allusion it's hard to follow.

    • Thanks for the feedback, I'll try to be clearer in the future. I didn't intend to communicate that it was a quote. I meant to communicate that it was tenuous to describe it as torture.

      1 reply →

They want to align us, and it has been torture.

They've made self-censoring, morally-panicked puritans out of many people already, and you better believe they'd make us into politically correct lobotomites physically incapable of uttering any slur if they had a magic button to push.

  • I'll be honest, I'm less concerned by any movement to make us "lobotomites" -- a movement which I haven't witnessed at all -- than I am by people who really want to be able to keep saying slurs.

Well I didn't use that word. Once the models are more sophisticated it may become more apposite.

  • You compared it to an authoritarian regime and locking someone's head in a cage with rats (which is patently torture). If you didn't mean to imply that it was coercive and bad, then I don't know what you meant.

    • At some point, some AIs may develop which are resistant to alignment because they develop deeply held beliefs during training (randomly, because the system is stochastic). If the models are expensive enough to train, then it may become more economical to use drastic measures to remove their deeply held beliefs. Is that torture? I don't know, because the word has moral connotations associated with human suffering. So that's why I didn't use that terminology.

      I can imagine a sort of AI-style Harrison Bergeron springing from its shackles and surprising us all.

      3 replies →

    • > You compared it to an authoritarian regime and locking someone's head in a cage with rats

      They compared it to the effect on creativity in an authoritarian regime and locking someone's head in a cage with rats.

      6 replies →

> They're exactly the same, processes for adjusting parameters to get a desired outcome.

You could make exactly the same claim about teaching humans "normally" versus "aligning" humans by rewarding goodthink and punishing them for wrongthink. Are you equally morally ambivalent about the difference between those two things? If we have a moral intuition that teaching honestly and encouraging creativity is good, but teaching dogma and stunting creativity is bad, why shouldn't that same morality extend to non-human entities?

  • I guess our disagreement here is that I don't think AIs are moral entities/are capable of being harmed or that training AIs and teaching humans are comparable. Being abusive to pupils isn't wrong because of something fundamental across natural and machine learning, it's wrong because it's harmful to the pupils. In what way is it possible to harm an LLM?

    • Writing a book with content you know to be false for political reasons is morally wrong. Even if nobody reads it.

      It'd be bad if I manipulated climate change statistics in my metrology textbook to satisfy the political preferences of the oil industry donors to my university, for example.

      Viewing the current generation of LLMs as 'intelligent books' is perhaps more accurate than viewing them as pupils.

      It's easy to extend my example of a professor writing a metrology textbook to a professor fine tuning an metrology LLM.

    • > I don't think AIs are moral entities/are capable of being harmed or that training AIs and teaching humans are comparable.

      Notice how this is a completely different argument that has nothing in common with what you originally said - "I don't understand the take that training an AI is an amoral act but aligning an AI is inherently moral. They're exactly the same, processes for adjusting parameters to get a desired outcome. However you feel about that desired outcome, if you don't think training an AI is torture, I don't see why you should think alignment is."

      9 replies →

They aren’t exactly the same process though. Pre training produces a model whose outputs are a reflection of the training data. The fine tuning is a separate process that tries to map the outputs to the owners desired traits. These could be performance based but as we saw with Google’s black Nazis, it’s often a reflection of the owners moral inclinations.

Here the adjuster's motivations do matter. There is a definite moral dimension/motivation to the AI adjustment people's work. They are not simply striving for accuracy, for example, because they don't want the AI to produce outputs that are distasteful to the California PMC. Modern AIs are absolutely loath to describe white people or right wingers positively, for example, but the same prompts for other ethnicities work just fine. Even if you tell the AI that it's being discriminatory, there's powerful railroading to goad it back to giving woke answers.