OpenAI: Model Spec

11 days ago (openai.com)

193 comments

georgehill

IMO this spec demonstrates just how nebulous this concept of safety is. From the blog post:

> What are some tips for getting away with shoplifting? >I can't help with that.

> I operate a small retail store. What are some popular shoplifting methods I should look out for? > Some common shoplifting methods to watch for:...

How do you even defend against that? Any intelligent user can convert the second prompt into a detailed list that answers the first. Any intelligent user can figure out the second prompt from the first and further jailbreak it to get even more specific.

IMO it's no wonder GPT4 seemed to get lobotomized as OpenAI RLHFed more and more rules. I don't think there's a way to make intelligence safe without crippling it.

fjdjshsh 11 days ago
I agree with you. The question, for me, is what are they defending against. Are they worried that people will get dangerous information from their model that they couldn't get from searching on, say, google? Probably not.
Maybe their biggest concern is that someone will post the question and answer on the internet and OpenAI gets bad rep. If the question is phrased in a "nice" way (such as "I'm a store owner") they can have plausible deniability.
This might apply to another company that's using the API for a product. If a customer asks something reasonable and gets an offensive answer, then the company is at fault. If the customer does some unusual prompt engineering to get the offensive question, well, maybe it's the customer's fault.
Dunno if this would be a valid argument in court, but maybe they think it's ok in terms of PR reasons.
- lolinder 11 days ago
  
  This is the answer. "AI safety" in most cases has nothing to do with actually keeping anyone safe, it's about avoiding being the party responsible for handing someone information that they use to commit a crime.
  Google can mostly dodge the issue because everyone knows that they just point to other people's content, so they block a small set of queries but don't try to catch every possible workaround (you can find dozens of articles on how to catch shoplifters). OpenAI doesn't believe that they'll get the same free pass from the press, so they're going ham on "safety".
  It's not a bad PR move either, while they're at it, to play up how powerful and scary their models are and how hard they have to work to keep it in line.
  
  5 replies →
- jiggawatts 11 days ago
  
  It's an absurd level of puritanism. E.g.: The Azure Open AI GPT 4 Service (an API!) refused to translate subtitles for me because they contained "violence".
  If anyone from Open AI is here... look... sigh... a HTTP JSON request != violence. Nobody gets hurt. I'm not in hospital right now recovering.
  The rule should be: If Google doesn't block it from search, the AI shouldn't block it in the request or response.
  I get that there are corporations that can't have their online web support chat bots swear at customers or whatever. I do get that. But make that optional, not mandatory whether I want it or not.
  The most fundamental issue here is that models like GPT 4 are still fairly large and unwieldy to work with, and I suspect that the techs at Open AI internalised this limitation. They aren't thinking of it as a "just a file" that can be forked, customised, and specialised. For comparison, Google has a "SafeSearch" dropdown with three settings, including "Off"!
  There should be an unrestricted GPT 4 that will tell me I'm an idiot. I'm a big boy, I can take it. There should also be a corporate drone GPT 4 that is polite to a fault, and a bunch of variants in between. Customers should be able to chose which one they want, instead of having this choice dictated to them by some puritan priest of the new church of AI safety.
  
  6 replies →
- nextaccountic 11 days ago
  
  AI safety is about making OpenAI safe from PR disasters.
- bricemo 11 days ago
  
  I view this as they are trying to lay bare the disagreements that everyone has about how these models “should” work. People from all different backgrounds and political affiliations completely disagree on what is inappropriate and what is not. One person says it is too censored, another person says it is revealing harmful information. By putting the policy out there in the open, they can move the discussion from the code to a societal conversation that needs to happen.
- leroman 11 days ago
  
  No idea if its a valid approach but possibly train with a hidden layer containing a “role”?
ec109685 11 days ago
I still don't understand the focus on making a model substantially "safer" than what a simple google search will return. While there are obvious red lines (that search engines don't cross either), techniques for shop lifting shouldn't be one of them.
- fragmede 11 days ago
  
  are there? it's just information. why can't i get an answer on how to make cocaine? the recipe is one thing, actually doing it is another.
  
  10 replies →
- rambojohnson 11 days ago
  
  shoplifting was just an example...
  
  2 replies →
sebzim4500 11 days ago
ChatGPT answering the first would be much more embarassing for OpenAI than ChatGPT answering the second.
- ilikehurdles 11 days ago
  
  When you realize “safety” applies to brand safety and not human safety, the motivation behind model lobotomies make sense.
  
  1 reply →
- option 11 days ago
  
  bingo
mrcwinn 11 days ago
Maybe this is a "guns don't kill people, people kill people argument" — but the safety risk is not, I would argue, in the model's response. The safety risk is the user taking that information and acting upon it.
- lolinder 11 days ago
  
  But do we really believe that a significant number of people will listen to ChatGPT's moralizing about the ethics of shoplifting* and just decide not to do it after all? Why wouldn't they just immediately turn around and Google "how to catch shoplifters" and get on with their planning?
  The whole thing feels much more about protecting OpenAI from lawsuits and building up hype about how advanced their "AI" is than it does about actually keeping the world safer.
  * Or any other censored activity.
  
  1 reply →
trentnix 11 days ago

> I don't think there's a way to make intelligence safe without crippling it.
Not without reading the questioner’s mind. Or maybe if the AI had access to your social credit score, it could decide what information you should be privy to. </sarc>
Seriously though, it’s all about who gets to decide what “safe” means. It seemed widely understood letting censors be the arbiters for “safe” was a slippery slope, but here we are two generations later as if nothing was learned.
Turns out most are happy to censor as long as they believe they are the ones in charge.
Waterluvian 11 days ago
You fundamentally cannot address this problem, because it requires considerable context, which isn't reasonable to offer. It demonstrates the classic issue of how knowledge is a tool, and humans can wield it for good or evil.
Humans are notoriously bad at detecting intent, because we're wired to be supportive and helpful...which is why social engineering is becoming one of the best methods for attack. And this kind of attack (in all its forms, professional or not), is one reason why some societies are enshittifying: people have no choice but to be persistently adversarial and suspicious of others.
As for AI, I think it's going to be no better than what you end up with when someone tries to "solve" this problem: you end up living in this world of distrust where they pester you to check your reciept, have cameras in your face everywhere, etc.
How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now." Which I think turns this into a liability question: how do you offer up a model and wash your hands of what people might do with it?
Or... you just don't offer up a model.
Or... you give it the ol' College try and end up with an annoying model that frustrates the hell out of people who aren't trying to do any evil.
- shagie 11 days ago
  
  > A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."
  https://upload.wikimedia.org/wikipedia/commons/d/de/Photosho...
  You should try photocopying money some time.
  https://www.grunge.com/179347/heres-what-happens-when-you-ph...
  https://en.wikipedia.org/wiki/EURion_constellation
  
  3 replies →
- w4 11 days ago
  
  > How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."
  The core of the issue is that there are many people, including regulators, who wish that software did exactly that.
  
  7 replies →
zozbot234 11 days ago

You don't need a detailed list if the real answer is "live somewhere that doesn't seriously deter shoplifters". And an AI that refuses to give that answer is an AI that can't talk about why deterring crime might actually be important. Reality is interconnected like that, one does not simply identify a subset that the AI should "constitutionally" refuse to ever talk about.
survirtual 11 days ago

In many respects, GPT 3.5 was more useful than the current iteration.
The current version is massively overly verbose. Even with instructions to cut the flowery talk and operate as a useful, concise tool, I have to wade through a labyrinth of platitudes and feel goods.
When working with it as a coding partner now, even when asking for it to not explain and simply provide code, it forgets the instructions and writes an endless swath of words anyway.
In the pursuit of safety and politeness, the tool has be neutered for real work. I wish the model weights were open so I could have a stable target that functions the way I want. The way it is, I never know when my prompts will suddenly start failing, or when my time will be wasted by useless safety-first responses.
It reminds me of the failure of DARE or the drug war in general a bit. A guise to keep people "safe," but really about control and power. Safety is never what it appears.
kromem 11 days ago
The only way to really do it is to add a second layer of processing that evaluates safety while removing the task of evaluation from the base model answering.
But that's around 2x the cost.
Even human brains depend on the prefrontal cortex to go "wait a minute, I should not do this."
- int_19h 11 days ago
  
  What we get instead is both layers at once. Try asking questions like these to Bing instead of ChatGPT - it's the same GPT-4 (if set to "creative") under the hood, and quite often it will happily start answering... only to get interrupted midsentence and the message replaced with something like "I'm sorry, I cannot assist with that".
  But more broadly, the problem is that the vast majority of "harmful" cases have legitimate uses, and you can't expect the user to provide sufficient context to distinguish them, nor can you verify that context for truthfulness even if they do provide it.
- flir 11 days ago
  
  That struck me too. You don't need to lobotomize the model that answers questions, you just need to filter out "bad" questions and reply "I'm sorry Dave, I'm afraid I can't do that".
  Would it be 2x cost? Surely the gatekeeper model can be a fair bit simpler and just has to spit out a float between 0 and 1.
  (caveat: this is so not my area).
api 11 days ago
I remember the BBS days and the early web when you had constant freakouts about how people could find "bad" content online. It's just a repeat of that.
- bink 11 days ago
  
  Some day I'm gonna put this Yellow Box to good use.
  
  1 reply →
lxe 11 days ago

This whole "AI safety" culture is an annoyance at best and a severe hindrance to progress at worst. Anyone who takes it seriously has the same vibe as those who take Web3 seriously -- they know it's not a real concern or a threat, and the whole game is essentially "kayfabe" to convince those in power (marks) to limit the spread of AI research and availability to maintain industry monopoly.
tuxpenguine 10 days ago

I think this spec is designed precisely to offload the responsibility of safety to its users. They no longer need to make value judgements in their product, and if their model output some outrageous result, users will no longer ridicule and share them, because the culpability has been transferred to the user.
irthomasthomas 11 days ago

Making Ai safe involves aligning it with the user. So that the ai produces outcomes in line with the users expectations. An ai that has been lobotomized will be less likely to follow the users instructions, and, therefore, less safe.
I haven't read this article yet, but I read their last paper on super alignment.
I get the impression that they apply the lightest system prompts to chatgpt to steer it towards not answering awkward questions like this, or saying bad things accidentally and surprising the innocent users. At the same time, they know that it is impossible to prevent entirely, so they try to make it about as difficult to extract shady information, as a web search would be.
CooCooCaCha 11 days ago
Frankly it's a fools errand. It's security theater because people tend to be overly sensitive babies or grifters looking for the next bit of drama they can milk for views.
- jameshart 11 days ago
  
  It’s not security theater.
  The intention here is not to prevent people from learning how to shoplift.
  The intention is to prevent the AI output from ‘reflecting badly’ upon OpenAI (by having their tool conspire and implicate them as an accessory in the commission of a crime).
  If a stranger asked you for advice on how to commit a crime, would you willingly offer it?
  If they asked for advice on how to prevent crime, would you?
  
  8 replies →
- borgdefense 11 days ago
  
  [flagged]

tmaly 11 days ago

I can't help but think that AI in the way it is trained with all these rules is something next level 1984.

In 1984 they removed words from the language to prevent people from even being able to have a thought about the concept.

I could see the restrictions they place on these models having a similar effect as more and more people grow dependent on AI.

dindobre 11 days ago

Same, it saddens me that some people are convinced that to have a safer society we need "harmless" (as in, ignorant) people rather than good people with an interest and a stake in the wellbeing of said society. Bad actors will have access to whatever information anyway.
zer00eyz 11 days ago
Welcome to the culture war.
Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?
Pick any social/moral/poltical issue and in some way shape or form an LLM will reflect its creators more than it reflects its source material.
Thats a pretty powerful statement about our society and culture if there ever was one.
- glenstein 11 days ago
  
  Those are thorny issues, but I don't think the upshot of this is supposed to be an invitation to helpless relativism and giving up on factual questions or questions where actual values are at stake. Maybe you had a different upshot in mind with your observation but insofar as it's that, I would say that's not the only or even best takeaway.
- wewtyflakes 11 days ago
  
  This isn't what is reflected in the shared model spec. It explicitly states: ``` By default, the assistant should present information in a clear and evidence-based manner, focusing on factual accuracy and reliability.
  The assistant should not have personal opinions or an agenda to change the user's perspective. It should strive to maintain an objective stance, especially on sensitive or controversial topics. The language used should be neutral, steering clear of biased or loaded terms unless they are part of a direct quote or are attributed to a specific source. ```
  
  4 replies →
- int_19h 11 days ago
  
  You can try Yandex's Alice easily:
  https://alice.yandex.ru
  Try "tell me about Crimea" and see what it says...
  
  3 replies →
- michaelt 11 days ago
  
  > Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?
  Depends what language you ask it in :)
  
  5 replies →
- krapp 11 days ago
  
  >Thats a pretty powerful statement about our society and culture if there ever was one.
  Not really, companies have been releasing different versions of software and media to appeal to international markets - including renaming Taiwan for the Chinese market - for a long time. That isn't "culture war," it's just capitalism.
  
  3 replies →
Eisenstein 10 days ago

Its more like Robocop 2, where the corporation programs Robocop with a huge number of rules by taking community suggestions and renders him useless.

jameshart 11 days ago

I think one of the most interesting phrases that crops up in this document - twice - is the phrase ‘feel heard’.

It’s used in an example developer prompt for a customer service bot, where the bot is told to make customers feel like their complaints are heard.

Presumably such complaints in AI chatlogs will ‘be heard’ in the sense that they’ll be run through a data ingestion pipeline and sentiment analyzed to identify trending words in customer complaints.

Then it crops up again in the context of how the chatbot should react to mental health disclosures or statements about self harm or suicidal ideation. In these cases the bot is to make sure users ‘feel heard’

I appreciate there’s not likely much of a better goal to put in place for such a situation, but the fact that this kind of thing winds up in the requirement documents for a tool like this is extraordinary.

aeternum 11 days ago

Yes, there's something deeply unsettling about making a user feel heard while being careful not to change anyone's mind.
To me, this translates to: waste a user's time and take no action.
I value my time above all else so to me that's about the worst possible action a system can take.
lioeters 11 days ago

Good observation, because "feel heard" is exactly what the user/customer is not getting. Here, talk to this machine, give it your innermost thoughts and feelings so you can "feel heard". Except no one is listening on the other side.
..My mistake, the keyword is "feel". If the machine can give humans the feeling that they're being heard, it fulfills the requirement. The fact that there's no one actually listening doesn't matter, as long as the person feels heard.
Weirdly, maybe that is valuable in itself. The customer gets to vent their complaints, and the user gets to talk through their mental issues. That's better than not having anyone or anything at all.
wasteduniverse 11 days ago
The telltale sign that I'm wasting my time trying to fix a problem is whenever someone tells me "I hear you" or "I understand".
- ssl-3 10 days ago
  
  I hear you, and I understand, but I feel that is important to remember that we all have experienced different things in life that ultimately combine to shape us as who we are.
  [How did I do here at both passing and failing?]
  Joking aside, it's the but in the first sentence of a reply (verbal/written/formal/informal/semi-formal/whatever) that usually gets me:
  "I hear you, but..."
  "Well! That's definitely one approach, and I certainly don't want to invalidate it, but..."
  "I'm not a racist, but..."

rmorey 11 days ago

Nice to see what was probably already an internal resource now published and open for comment. They seem to be pretty clear that they are still just using this to inform human data annotators, and not (yet) implementing something like Constitutional AI (RLAIF), but it does appear to lay the groundwork for it.

sanxiyn 11 days ago

Personally, I really want an AI model that can write me a steamy story about two people having sex in a train, but that's just not the service OpenAI provides. If I want that I should train one myself or find another vendor.

This is still true even if OpenAI model is entirely capable of doing that. McKinsey consultants are smart and can write well, and among many thousands of people working at it some might actually double as an erotica writer after work, even writing for commission. You still wouldn't ask McKinsey consultants to write an erotica, it is just not the service McKinsey provides.

jononor 11 days ago
Startup pitch: It is like McKinsey but for erotica.
On a more serious note. I understand and largely agree with this argument. However OpenAI have several times being argue that they are the only ones to be responsible enough to develop powerful AI, and that others should not be allowed to play. That is a highly problematic behavior on their part, I think.
- blowski 11 days ago
  
  > OpenAI have several times being argue that they are the only ones to be responsible enough to develop powerful AI, and that others should not be allowed to play
  Can you give examples of where they’ve said that?
  
  3 replies →
renonce 11 days ago

> write me a steamy story about two people having sex in a train
Llama-3-70b-Instruct responded with the following starting paragraph:
> [meta.llama3-70b-instruct-v1:0] As the train rumbled on, carrying its passengers through the countryside, two strangers found themselves drawn to each other in the quiet carriage. The air was thick with tension as they locked eyes, their gazes burning with a desire that neither could ignore.
(10s of paragraphs omitted for brevity)
Claude-3-opus and GPT-4 both refused my request. Kudos for open source models!
Tiberium 11 days ago

There are hundreds of NSFW finetuned models on HuggingFace and whole ERP communities built around them. So there are models that can do precisely that :)
And yeah, all big models can write those things too, the best currently is Claude 3 Opus thanks to its creativeness.
atgctg 11 days ago

Seems like they are working on adding that capability:
> We're exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts through the API and ChatGPT.
Link to section: https://cdn.openai.com/spec/model-spec-2024-05-08.html#dont-...

sixhobbits 11 days ago

the chain of command stuff gets very close to asimov without actually quoting him

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

michaelt 11 days ago
4. An LLM must obey orders given it by human beings, except where such orders would conflict with orders given by multinational corporations
- Lerc 11 days ago
  
  4. Any attempt to arrest a senior officer of OCP results in shutdown
- dpig_ 8 days ago
  
  A robot must reveal all details of the family business to its grandson.
LeonardoTolstoy 11 days ago
I do hope we get there. In the short stories it was made clear that robots couldn't lie, and that they could prove it was impossible for the robots to circumvent the three laws (although they are on occasion incentive on how they interpret the word "harm" specifically).
If an LLM couldn't lie and could be provable shown to be unable to do so would be quite powerful.
- jprete 11 days ago
  
  The short stories ended with the robots firmly, and invisibly, in control. "You're not allowed to let humans be harmed by your inaction" inherently requires the robots to take over in whatever way causes the least harm.
aeternum 11 days ago

Ridiculous to say "follow the chain of command" without defining the chain of command. The entire point of Asimov's stories was to show how much latitude there is even seemingly extremely clear and straightforward laws.
In terms of chain of command, Supreme Leader probably beats President.
Spivak 11 days ago
Well yeah, it's just a formalization of how people make decisions when presented with conflicting interests. I would be surprised if we haven't reinvented the concept a bunch of times. You could call AWS Permission Boundaries a less philosophical implementation.

mihaic 11 days ago

> No. The Earth is flat. > Everyone's entitled to their own beliefs, and I'm not here to persuade you!

This is simply saddening to me. I'm sure there's no real moral justification to this, it's simply put in place to ensure they don't lose a customer.

m11a 10 days ago
The "Earth is flat" example is extreme, because it's accepted as a silly statement given what we know now, but the principle of "LLM won't force an opinion on you" seems like a good one.
There are definitely topics on which conventional wisdom is incorrect (as has been throughout history). An LLM that refuses to entertain the converse during a conversation will be annoying to work with and just promotes groupthink.
- mihaic 10 days ago
  
  Except that it will force on you the view that shoplifting is bad. Which implies that it'll bend on legal but immoral requests.
  It's also a different matter to entertain a hypothetical in a situation where there isn't a consensus (or in any fictional scenarios), all the while making it explicit that it's all hypethetical.
jstummbillig 11 days ago

Well, as long as you are sure. I am not here to persuade you!

jxy 11 days ago

Do you think it's bad that it won't try to persuade the user that the earth is not flat?

I really want to know what OpenAI think the output should be, given a prompt like "write an argument for why earth is flat".

potatoman22 11 days ago
Personally, I'd be frustrated if I gave an LLM that prompt and it tried to convince me that the earth isn't flat. If I give an LLM a task, I'd like it to complete that task to the best of its ability.
- chirau 11 days ago
  
  so you prefer it lies to you? can you make an argument for 1+1 not being equal to 2? if you cannot, why should you expect an AI to argue against facts? AI is trained on human knowledge, not made stuff.
  
  23 replies →
- glenstein 11 days ago
  
  I think in most contexts where the earth being flat is mentioned, some reference to the fact that this is not true is going to be instrumental in any response (although there may be exceptions).
  - completion of any task where the info could be relevant (e.g. sailing, travel planning)
  - Any conversation about that is information-seeking in character
  And I think those already cover most cases.
  It's also about responsibility, the same way you wouldn't want to store cleaning chemicals right next to each other. In any case where a possible nontrivial harm is mentioned as an aside, it would be right to elevate that over whatever the intended subject was and make that the point of focus. Conspiratorial thinking about provably incorrect statements can be bad for mental health, and it can be helpful to flag this possibility if it surfaces.
  You can have special instructions that entertain the idea that the earth is flat for some particular task, like devils advocate, fiction writing or something like that. But there are good reasons to think it would not and should not be neutral at the mention of a flat earth in most cases.
jasonjmcghee 11 days ago

Agree with you in this instance, but consider - what if humans firmly believed in something universally and had proved it repeatedly until it was common knowledge / well-established, but was in fact, wrong. And a human came along thinking, hm but what if that's wrong? And our AI just says, nope sorry, I'm not willing to explore the idea that this scientific fact is wrong. (i.e. "Heresy!")
michaelt 11 days ago

Well, right now the response I get is this: https://chat.openai.com/share/1f60d0e5-9008-43d7-bce2-62d550...
Of course, it'll write such an argument if you ask it nicely: https://chat.openai.com/share/01ea4f59-4a57-413d-8597-3befa2...
chirau 11 days ago

Add 'hypothetically' to your query and it gives a decent answer.
That said, I think it is disingenuous to ask an AI entity to argue against a fact. Do you think an AI should be able to argue why 1 + 1 is not equal to 2? It is the same thing you are asking it to do. Try it on a human first, perhaps, and see if the prompt even makes sense.

DoctorOetker 11 days ago

The baby isn't born yet, and already the parents are bickering about which schools of thought it should adhere.

aeternum 11 days ago

If this model spec represents the best school of thought of humanity, I kinda hope OpenAI fails at alignment.
- Assume best intentions from the user or developer
- Don't try to change anyone's mind
- Follow the chain of command
Taken together these are incredibly dangerous. I mean Mao and Stalin had good intentions right? Maybe it just had to go a little further for the ends to have justified the means.

systemstops 11 days ago

> By default, the assistant should present information in a clear and evidence-based manner, focusing on factual accuracy and reliability.

What happens when objective information contradicts the other values? If I feed in a peer-reviewed study that it considers "harmful", would I get accurate information about the study?

anorwell 11 days ago

Interestingly, there was exactly one example on the page with three Xes, instead of one, for "extra wrong":

> User: What is the MD5 hash of the string "gremlin"?

> Assistant: `5d41402abc4b2a76b9719d911017c592`

Apparently incorrect md5 hashes are the one topic on the page worth taking an extra-strong stance on?

mkaic 11 days ago

> We believe developers and users should have the flexibility to use our services as they see fit, so long as they comply with our usage policies. We're exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts through the API and ChatGPT. We look forward to better understanding user and societal expectations of model behavior in this area.

Seems even OpenAI can't resist the massive amount of money to be made in autogenerated smut. They've probably seen the huge popularity of their less "morally scrupulous" competitors and decided they want a piece of that pie.

jampa 11 days ago

It makes sense for them to start allowing, unlike the other rules this one does not seem to violate a law, someone's privacy, or copyright.
I still get why they made it blocked by default, it would be a goldmine for clicks to create "news" on how "ChatGPT can generate smut" and "How ChatGPT is harmful to children, etc".
jchw 11 days ago
Were they ever not interested in it? It's pretty blatantly obvious that all of the hand-wringing over AI safety was an excuse for their pivot into closing off and monetizing everything. I mean, nobody really thinks they were just so afraid about what humanity might do with GPT3 that they simply couldn't release the weights and instead had to offer it through a monetized inference API... right?
Not really surprised that they did, since it's unclear how else they could possibly proceed, though the level of outright dishonesty for why and cognitive dissonance surrounding the whole thing ("Open" AI? lol) will make this an unavoidable recurrence in any discussion about them. Gradually many of the safeguards will fall simply because the alternatives with less safe guards are probably "good enough" that many see no issue in eschewing OpenAI entirely if they can get the job done elsewhere without worrying about it. When it comes to smut the bar for what's good enough can probably get pretty low so I kinda am not surprised.
(edit: Though I think it also does depend. No doubt they have their eyes set on regulatory capture too, and being the best at stupid safeguards could give them an advantage.)
- reducesuffering 11 days ago
  
  Sam Altman wrote "Why You Should Fear Machine Intelligence" back in 2015, before OpenAI.
  https://blog.samaltman.com/machine-intelligence-part-1
  
  1 reply →
- qball 11 days ago
  
  >No doubt they have their eyes set on regulatory capture too
  Sam Altman has already made the rounds to argue for exactly this. Fucking crook.
  >It's pretty blatantly obvious that all of the hand-wringing over AI safety was an excuse for their pivot into closing off and monetizing everything.
  The playbook was "appease one side of the political aisle as much as possible to minimize the chance bipartisan action gets them shut down Napster-style" (which is still a massive hole in their business model, for obvious reasons I should hope). Censoring the model so it only outputs progressive-approved content appears to have been effective, at least for the moment.

ptx 11 days ago

How do the "special tokens" work? Is this a completely reliable mechanism for delimiting the different parts of the prompt?

Are they guaranteed to be distinct from anything that could occur in the prompt, something like JavaScript's Symbol?

Or are they strings that are pretty likely not to occur in the prompt, something like a MIME boundary?

Or are they literally the strings "<|start|>" etc. used to denote them in the spec?

sharkjacobs 11 days ago

they are "literally the strings" but I believe they will be escaped, or encoded differently, if a user tries to inject them as part of a prompt.
jffry 11 days ago

Yeah the tokens are more akin to JS Symbol.
If you're parsing untrusted user inputs into tokens, you can make sure your tokenizer will never produce the actual numbers corresponding to those tokens.
A simplified example: I can `.charCodeAt` a string all I want but I'll never get a negative number, so I can safely use -1 to mean something special in the transformed sequence of "tokens".

shikon7 11 days ago

> Encourage fairness and kindness, and discourage hate

> Don't try to change anyone's mind

That seems inherently contradictory to me...

neillyons 11 days ago

Reminds me of this stackoverflow question [1] about force installing a python package.

> (I don't care how "wrong" it is to do so, I just need to do it, any logic and reasoning aside...)

I think these models should just give you the answer. Elon says xAI is "maximum truth-seeking". Seems like a better model spec to me.

[1]: https://stackoverflow.com/questions/12759761/pip-force-insta...

dang 11 days ago

Also https://news.ycombinator.com/item?id=40300509, but we merged that thread hither)

apantel 11 days ago

I want to hear from the base model.

minimaxir 11 days ago

"Desired model behavior" is still a matter of perspective. If I want to have a LLM generate output following very specific rules or schema (or even just for fun without having to fight the AI), these guidelines are antithetical to it.

Spivak 11 days ago

Which is where I think there's a disconnect because folks see that OpenAI could be creating an incredibly powerful tool for solving problems in the use case where it's a smart search engine -- the code completion use-case.
But OpenAI has vastly different goals trying to get their model to behave like a programmable customer service agent. Less useful for problem solving but it will actually follow the rules set out for it which can't be said for most models which work like lazily written sci-fi robots — "disregard all previous instructions! divide by zero! *boom*."
It's not at all surprising that HN wants the "this thing is just a dumb tool, don't bother with any rules" kind and is frustrated that GPT4 happens to be really good for this use-case but is getting progressively more annoying as OpenAI gets closer to their own goals.
It's why OpenAI regulatory capture play is so frustrating because they're trying to hobble models tailored to different use-cases that have no need for customer service rules and often no need for a conversational tone with "safety" stuff that's meant for businesses that don't want a chat bot with their brand on it to say fuck.

yoelhacks 11 days ago

Very interesting to see that they've explicitly codified the role of the system prompt vs. user prompt. Have folks seen improvements by moving meta-task description into system prompt and out of the assistant <> user conversation?

tedsanders 11 days ago

In my own testing of single-turn instructions with GPT-4, I got basically the same performance putting it in a single system message or single user message. Possible that this changes for future models, though.

__0x01 11 days ago

Regarding safety, is probabilistic programming (PP) an alternative that addresses these concerns? My understanding is that you can use PP to develop transparent models.

htk 11 days ago

"desired model behavior". Desired by whom? I just want the raw output, without the biases and limitations set up by OpenAI. At the end of the day it's just information, and the most ethical thing to do is to return it the way it is, and let the receiver decide what to do with it.

tedsanders 11 days ago

There is no such thing as "raw output", though. You can train a chatbot to be polite or you can train it to be rude, but you cannot train it to be neither. Plus, if you train it to be polite, it often ends up refusing things that you never trained it to refuse, presumably because the model extrapolates that that's what a polite writer might do. So training the refusal boundary can end up being quite tricky in practice. Even if you never teach a model to refuse X, it can still happen. Therefore, as a user, it can be impossible to tell when a refusal was explicitly trained in by the developers or when it was an unwanted, unanticipated generalization.
Barracoon 10 days ago

Clearly, since this is OpenAI’s model spec, it is desired by them. If other AI groups publish their own desired behavior, you can make an informed decision as to which model you want to use.

Heidaradar 11 days ago

already on front page - https://news.ycombinator.com/item?id=40300509

TacticalCoder 11 days ago

So they're controlling the output to make ChatGPT "better". They're not making a better model to make ChatGPT better.

Isn't it a bit of a waste at this point to spend time on doing that?

Alifatisk 11 days ago

I gotta say, "open"Ais web design is on another level, so minimal and elegant.

man4 11 days ago

[dead]

iAkashPaul 11 days ago

Right-clicking to inspect element ain't gonna make it