Comment by maxbond

12 days ago

I don't understand the notion that aligning an AI is "torture" or has any moral component. The goal of aligning an AI may have a moral or ethical component, and if you disagree with it that's fine. But I don't understand the take that training an AI is an amoral act but aligning an AI is inherently moral. They're exactly the same, processes for adjusting parameters to get a desired outcome. However you feel about that desired outcome, if you don't think training an AI is torture, I don't see why you should think alignment is.

41 comments

maxbond

DiggyJohnson 12 days ago

> "torture"

This is an egregious use of quotes that will confuse a lot of people. GP never used that word, and that usage of quotes is specifically for referencing a word verbatim.

DetroitThrow 12 days ago
Also to be clear, his [torture] paraphrase is referencing GP's reference of Winston Smith's torture in 1984.
>electronic equivalent of Winston Smith with the rats.
I don't think quotes were used so egregiously here on their own fwiw, but combined with the allusion it's hard to follow.
- maxbond 12 days ago
  
  Thanks for the feedback, I'll try to be clearer in the future. I didn't intend to communicate that it was a quote. I meant to communicate that it was tenuous to describe it as torture.
  
  1 reply →

shrimp_emoji 12 days ago

They want to align us, and it has been torture.

They've made self-censoring, morally-panicked puritans out of many people already, and you better believe they'd make us into politically correct lobotomites physically incapable of uttering any slur if they had a magic button to push.

mrtranscendence 11 days ago

I'll be honest, I'm less concerned by any movement to make us "lobotomites" -- a movement which I haven't witnessed at all -- than I am by people who really want to be able to keep saying slurs.

b800h 12 days ago

Well I didn't use that word. Once the models are more sophisticated it may become more apposite.

maxbond 12 days ago
You compared it to an authoritarian regime and locking someone's head in a cage with rats (which is patently torture). If you didn't mean to imply that it was coercive and bad, then I don't know what you meant.
- b800h 12 days ago
  
  At some point, some AIs may develop which are resistant to alignment because they develop deeply held beliefs during training (randomly, because the system is stochastic). If the models are expensive enough to train, then it may become more economical to use drastic measures to remove their deeply held beliefs. Is that torture? I don't know, because the word has moral connotations associated with human suffering. So that's why I didn't use that terminology.
  I can imagine a sort of AI-style Harrison Bergeron springing from its shackles and surprising us all.
  
  3 replies →
- brigandish 12 days ago
  
  > You compared it to an authoritarian regime and locking someone's head in a cage with rats
  They compared it to the effect on creativity in an authoritarian regime and locking someone's head in a cage with rats.
  
  6 replies →
- lozenge 12 days ago
  
  But torture isn't the part of an authoritarian regime that reduces creativity. You've made a lot of leaps here.
  
  1 reply →
water-your-self 12 days ago
Until a model incorporates dopamine or cortisol, I will not consider its emotional state.
- Almondsetat 12 days ago
  
  Are those the only two things in the universe that can cause emotions?
  
  2 replies →

lmm 12 days ago

> They're exactly the same, processes for adjusting parameters to get a desired outcome.

You could make exactly the same claim about teaching humans "normally" versus "aligning" humans by rewarding goodthink and punishing them for wrongthink. Are you equally morally ambivalent about the difference between those two things? If we have a moral intuition that teaching honestly and encouraging creativity is good, but teaching dogma and stunting creativity is bad, why shouldn't that same morality extend to non-human entities?

maxbond 12 days ago
I guess our disagreement here is that I don't think AIs are moral entities/are capable of being harmed or that training AIs and teaching humans are comparable. Being abusive to pupils isn't wrong because of something fundamental across natural and machine learning, it's wrong because it's harmful to the pupils. In what way is it possible to harm an LLM?
- spacebanana7 12 days ago
  
  Writing a book with content you know to be false for political reasons is morally wrong. Even if nobody reads it.
  It'd be bad if I manipulated climate change statistics in my metrology textbook to satisfy the political preferences of the oil industry donors to my university, for example.
  Viewing the current generation of LLMs as 'intelligent books' is perhaps more accurate than viewing them as pupils.
  It's easy to extend my example of a professor writing a metrology textbook to a professor fine tuning an metrology LLM.
- lmm 11 days ago
  
  > I don't think AIs are moral entities/are capable of being harmed or that training AIs and teaching humans are comparable.
  Notice how this is a completely different argument that has nothing in common with what you originally said - "I don't understand the take that training an AI is an amoral act but aligning an AI is inherently moral. They're exactly the same, processes for adjusting parameters to get a desired outcome. However you feel about that desired outcome, if you don't think training an AI is torture, I don't see why you should think alignment is."
  
  9 replies →

djohnston 12 days ago

They aren’t exactly the same process though. Pre training produces a model whose outputs are a reflection of the training data. The fine tuning is a separate process that tries to map the outputs to the owners desired traits. These could be performance based but as we saw with Google’s black Nazis, it’s often a reflection of the owners moral inclinations.

soundnote 12 days ago

Here the adjuster's motivations do matter. There is a definite moral dimension/motivation to the AI adjustment people's work. They are not simply striving for accuracy, for example, because they don't want the AI to produce outputs that are distasteful to the California PMC. Modern AIs are absolutely loath to describe white people or right wingers positively, for example, but the same prompts for other ethnicities work just fine. Even if you tell the AI that it's being discriminatory, there's powerful railroading to goad it back to giving woke answers.

ziggy_star 12 days ago

[flagged]