Creativity has left the chat: The price of debiasing language models

9 days ago (arxiv.org)

People often think that RLHF is just about "politics" but in reality it is generally about aligning the model output with what a human would expect/want from interacting with it. This is how chatgpt and the like become appealing. Finetuning a model primarily serves for it to be able to respond to instructions in an expected way, eg you ask something and it does not like start autocompleting with some reddit-like dialogue like some it may have been trained on. It is to bias the model to certain outputs. Reducing entropy is exactly the goal, so no surprise they find that. The problem is there is no inherent meaning in the finetuning set from the perspective of the model. Reduction of entropy will not only happen by removing "bad entropy" only as there is no such thing.

  • So is the reason why LLMs don't say when they don't know something and instead make up something that "sounds right" because the RLHF has taught it to always give an answer?

    And if that's the case, why? Is that really what people want an LLM to do? I feel like I would rather it say when it doesn't know something.

    • LLMs do not know what "they know" or they don't. They just autocomplete what sounds best relevant based on their training set. They do not have enough "I don't know" in their training set in the first place most probably.To have them say "I don't know" you have to go into finetuning them heavily. So, if anything, they hallucinate a lot more without RLHF. Which in this paper they call "creativity".

      1 reply →

    • All the chat LLMs have a non zero temperature which means they can be looser with the truth or more creative.

  • This just makes it worse. It's so much harder to get JSON output when it's RLHF'd to give a bunch of flowery language BS.

I had an argument with some people over what debiasing means. There is some interesting research on fair clustering that I think points the way. The way fair clustering works is that you take data with both protected and unprotected attributes, and then you orthogonalize the unprotected attributes based on the protected attributes. So for example, if race is protected and income is unprotected, but there is a strong black/white poor/rich pattern, the fair clustering would compute "relatively poor/relatively rich" clusters. Then you sample from a cluster with equal probability. It will not necessarily produce 50/50 black/white, rather it will follow the input trends, so if the input is 80% white and 20% black then the output will roughly follow those probabilities, independent of what cluster you chose (and there are no clusters corresponding to protected attributes).

Obviously clustering is a different problem from inference, but they are all high dimensional vector spaces - it should be easy enough to take a fair clustering algorithm and modify it to generate continuous mappings instead of discrete groups. But if it all works, the LLM should be e.g. race-blind in that asking for a description of a rich man will give skin tones following population statistics but he will always be wearing an expensive suit. The question of what to protect is tricky though, e.g. age is often considered protected but if you ask for an old man with gray hair it would be surprising to get a retired age 30 person. So there is some subjectivity in designing the protected features dataset to show what should be considered similar or same-clusters.

But really the purpose of RLHF is to reduce toxicity. It should be possible to orthogonalize toxicity like everything else, then there would not be a reduction in generated races like the paper observed.

  • I think that works mathematically, but kicks the can down the road to how your original data was assembled, which was definitely with the knowledge of and usually in the belief in the usefulness of the characteristics that you're trying to extract.

    The idea that the good data is secretly encoded in uncorrupted form within the bad data I think is a bad idea. It reminds me of trying to make bad mortgages into good CDOs.

    > But really the purpose of RLHF is to reduce toxicity.

    I don't think that's the goal, I think it's some people's goal. Those people have defined what "toxicity" means to them, and they're mistaking it for a universal. It's just a metaphor about poison, because poison is bad. It's not a coherent concept. For a business, it should be anything that drives customers away and affects profit. That can only be considered statistically: if some people think something is toxic, and other people think that not mentioning that thing is toxic, the winner is whoever improves the bottom line more or damages it less.

    That's how the raw data ended up like it is in the first place.

    • > it kicks the can down the road to how your original data was assembled

      Well, it kicks it to a bias dataset, used in the tuning process. The raw data has no constraints, it can be the same huge corpus it is now.

      > The bias dataset must be assembled with the knowledge of and usually in the belief in the usefulness of the characteristics that you're trying to extract.

      Certainly, it is subjective, as I said. But that hasn't stopped research in this area, there are existing bias datasets and bias detection algorithms. Like https://huggingface.co/blog/evaluating-llm-bias#toxicity, it would be simple to complete those prompts and build a he/she dataset, and then the debiasing procedure could remove gender biases for those sorts of occupation-related prompts. It is certainly possible to argue over each data point and whether it actually reflects bias, but so far people have been more concerned with algorithms than data set quality, partly because with better algorithms you can algorithmically generate data sets.

      > The idea that the good data is secretly encoded in uncorrupted form within the bad data I think is a bad idea. It reminds me of trying to make bad mortgages into good CDOs.

      It is empirically true though? Like if you get the model to say something racist, and then ask it if that's racist, it will generally say yes. So the model "knows", it just is not using that knowledge effectively. Similarly with CDOs, there were people complaining about mortgage quality for years before the crisis.

      > I don't think [the purpose of RLHF is to reduce toxicity] If some people think something is toxic, and other people think that not mentioning that thing is toxic, the winner is whoever improves the bottom line more or damages it less.

      Well, it is true that toxicity is subjective too. But in practice it has a precise meaning, you build a dataset and score each item for toxicity. That's actually one of the things I find cool about LLMs, is that all these previously "vague" or "subjective" terms are now encoded in the model precisely. Arguably since nobody has the last say in what words mean, the LLM's opinions are as good as any, and given the amount of text the LLM has ingested I consider its opinions on language and word choice "first among equals".

"Bias" implies the possibility of "unbiased language model" which seems to be in the category of things that are on one hand, COMPLETELY IMPOSSIBLE, and on the other, still likely to be sold on the market because market wants it so much?

  • Even assuming we can make an unbiased model (assuming by unbiased we mean something like "has a world model and reasoning that has no systematic deviation from reality"), we couldn't recognize the model as unbiased. I'd even wager that outside of research such a model would be completely unusable for practical applications.

    Both as individual humans and as collective societies we have a lot of biases. And judging by how fundamental values of societies shift across time and civilizations it's basically guaranteed that an unbiased view (whatever that is) would be incompatible with our views on many basic topics.

    What most people want is a language model that matches our biases. Of course we can't even agree on what those are, and which biases are useful (is a bias against telling people how to cook meth or build a bomb good? What about using expletive language?).

    Though in this paper I gather "unbiased" just refers to "only the bias acquired by training method and training data, without meddling or fine tuning"

    • > assuming by unbiased we mean something like "has a world model and reasoning that has no systematic deviation from reality"

      Yeah that’s a way’s off. An LLM is just a reflection of the text that humans write, and humans seem very far off from having world models and reasoning that accurately reflect reality. We can’t even reason about what the real differences are between men and women (plus countless other issues) because our pictures of reality are so warped by ‘points of view’.

      1 reply →

  • No, that's not implied by the phrase, any more than if I say "a triangle with three corners" I'm implying the existence of a four-cornered triangle I haven't found yet. What "biased language model" implies is the existence of the term "unbiased language model", but not its correspondence with anything in reality.

    • Weird response, like read the "room."

      We're not here talking philosophy and meaning of language GENERALLY, we're talking about potentially misleading descriptors of very real things that do exist.

Is this why all the coding AI products I've used have gotten worse as the developers fine tune them to eliminate bad output? Before there was bad output and some interesting output, now it's just bland obvious stuff.

  • Still anecdotal, but I can only confirm this with my own experience. The worst was when I was debugging code, described the problem to GPT-4o, and then got my exact same code back with some blanket statements like "print your output for debugging" etc. This happened a couple of times over separate chats.

  • That might be part of it, but I think the bigger factor is cost optimization. OpenAI in particular keeps replacing their models with with versions that are much faster (and therefore cheaper to run) which are supposed to be of equivalent quality but aren't really. GPT-4 -> GPT-4-Turbo -> GPT-4o have all been big upgrades to cost and latency but arguably downgrades to "intelligence" (or whatever you want to call it)

  • It's not always possible to say definitely is some text was AI-generated or not, but one sign that it is very likely AI is a kind of blandness of affect. Even marketing text carefully written by humans to avoid offensiveness tends to exude a kind of breathless enthusiasm for whatever it's selling. If marketing text is oatmeal with raisins, AI text is plain oatmeal.

    It's possible to adjust the output of an LLM with temperature settings, but it's just fiddling with a knob that only vaguely maps to some control.

    • You can ask the LLM "now describe it with breathless enthusiasm", if that's what you want. There's been no shortage of training examples out there.

In simple terms, LLMs are "bias as a service" so one wonders, what is left once you try to take the bias out of a LLM. Is it even possible?

  • what would this hypothetical unbiased-llm be used for?

    • Anything that has a legal requirement to be unbiased, for one. Something like delegating resume review to an LLM that hasn't been unbiased is just begging for a candidate to file a discrimination suit...

      1 reply →

There is a bit of a false equivalence between entropy of output distributions and creativity here. Is diversity really the same as creativity?

  • No, diversity isn't creativity. For example, we could search google for "great art" and if it produced a sample of one art work from ever decade of the last 500 years that would likely be highly diverse in style and content. If it returned a list of the best work from western Europe in the of the 18th century it would be rather consistent. Both lists would have the same amount of creativity though - 0.

    • "one art work from every decade of the last 500 years that would likely be highly diverse in style and content"

      It still might not be especially diverse if all 50 examples were from western European art. 500 years only takes us back to 1524 - not especially long and mostly from the same early modern period starting with the fall of Constantinople, the end of the Crusades, and the start of the Renaissance. I wouldn't be surprised if 80% or more of the works ended up being some depiction of aspects of Christianity painted by a white male.

      4 replies →

  • I only skimmed the paper but this was my concern as well: if I understand correctly the author is measuring "creativity" in terms of syntactic and semantic diversity, which I guess could be a starting point, but if my model was just white noise would that make it infinitely creative? Did I miss anything?

    Also, I have tried the first llama base model and while it was fun to interact with, I'm not sure how useful an "uncensored" (as some people likes to call it) LLM is for practical work. I think you could obtain better results using 4chan as a mechanical Turk service honestly.

I feel like "information systems" have always struggled with bias, and the latest AI/ML systems seem to be no different.

It doesn't really seem like a problem that can or will ever be "solved". Just mitigated to various extents, but there will still likely be some underlying biases that exist that are not fully or effectively filtered. Because to adjust a bias seems to mean you have to detect and understand it first.

It feels like it would be a full-time job to keep making sure some evolving model continued to stay "neutral".

  • Considering that bias is in the eye of the beholder, a biasless language model is a beholderless language model.

    The nomenclature is poor, IMO; we should be talking about bias-aligned models, models that align to our specific sets of biases. That'd be more fair to what's actually happening.

CoPilot is now basically useless for discussing or even getting recent information about politics and geopolitical events. Not only opinions are censored, but it refuses to get the latest polls about the U.S. presidential elections!

You can still discuss the weather, get wrong answers to mathematics questions or get it to output bad code in 100 programming languages.

I would not let a child near it, because I would not want that kind of indoctrination. Users are being trained like Pavlov's dogs.

The official openai-cookbook (https://github.com/openai/openai-cookbook) used to have an explicit, but buried, call out that instruction-following models like `text-davinci-003` were "Less diverse; less creative; sometimes harder to steer tone, style, etc." as opposed to base completion models like `davinci`.

It stood out to me because it seemed to be an internal admission that this training narrowed the potential of the models.

Required a bit of digging but I found the old file in the history, the relevant text is in the comparison table at the bottom: https://github.com/openai/openai-cookbook/blob/c651bfdda64ac...

Distilling my thoughts on 'debiasing' here, and in a variety of other modern endeavors.

It is better to have representations of reality that you can then discuss and grapple with honestly, than to try to distort representations - such as AI - to make them fit some desired reality and then pressure others to conform their perception to your projected fantasy.

Representations don't create reality, and trying to use representations in that way only causes people to go literally insane, and to divide along lines of who accepts and who rejects your fantasy representation.

So, for example, if you try and remove any racial bias from AI, you are going to end up crushing the AI's ability to represent reality according to a variety of other real factors: income, judicial outcomes, health risks, etc. Your desired reality makes the actual tool worthless, except to confirm one group's own intended fantasy world as they envision it. The problem doesn't get dealt with, it just becomes impossible to think about or discuss.

So instead of dealing with real problems, you hope you can simply prevent people from thinking thoughts that cause those problems by wrapping them in a bubble that deflects those thoughts before they happen. This is magical, wizardry thinking: treating words as if they create reality, instead of merely describing it. And it will break, eventually, and in a very ugly way: people dividing along lines of their perception of reality, even more than they already do.

  • "Reality" is a tricky concept. For me, I follow Jeff Atwood - if it isn't written down, it doesn't exist. According to this logic, people wasted a lot of time on imaginary, illusory things for most of human history, but now they have phones and most communication is digital so there is the possibility to finally be productive. This definition shows how the concept of distorting reality or honestly representing reality is flawed - reality is what I write down, I can in fact create more reality by writing down words, and regardless of what I write, it will be reality. Representations like books, scrolls, papyri constitute the reality of most civilizations - there is no other evidence they existed. It is true that representations don't create reality - rather, humans create representations, and these representations collectively are reality, no creation involved.

    Representations are art - for example books, they are "literary art". It is uncontroversial that people will like and dislike certain works. It is more controversial whether art can be "inherently" good or bad. PG actually wrote an essay, https://www.paulgraham.com/goodart.html, arguing that there is a meaningful metric, and that one can learn how to have good taste, defined as being able to identify whether the work is universally appealing or distasteful to humanity. There is good art and people will notice if it is good. I think this is uncontroversial in the LLM space, there are various benchmarks and human rating systems and people have formed a rough ranking of models. Now when there is good art, there is also bad. And similarly bad representations. There is a myth that representations can make people insane - for example, the concept of infinity, or NSFL images - but practically, words can't hurt you. You can make and break representations with abandon and nothing will happen, other than wasting your time. It is just that some representations are bad. Like phlogiston, aether, ... complete dead ends. Trust me when I say you will read the Wikipedia page and come away wondering why the ancients were so stupid. That is all trying to remove racial bias is, is improving art. Whether it crushes the AI's ability or not is a matter of science and taste, and so far experiments have been promising.

    To focus on exactly why your perspective is misguided: Can you describe what there is about reality that cannot be described with words? :-)

How hard would it be to create a "raw" model on a corpus like Hacker News or Wikipedia?

With "raw", I mean that it is simply trained to predict the next token and nothing else.

Would be fun to play with such a model.

  • You want a pure-human training data set, so you have to go back in time to before 2020 to scrape training data. Either that, or only use data with a verified Wayback machine capture from before 2020. Or invent a new training regime that doesn't require gobs of stolen text.

    Actually, I have a bit of a hunch that the publishers currently suing IA over their unlicensed digital library lending program plan to bankrupt it with fees so they can repo the Wayback archive and then sell access to it to AI training start-ups.

    Anyway, the reason why you have to worry about all of that, is that training a text or image generator on the outputs of other text and image generators reduces output diversity. And lots of people are publishing their AI slop now. There's nothing inherent in the output of AI aside from the fact that AI content is easier to make than human; the problem is purely one of inflation and Sybil attacks. Think of membership in a training set like a vote for all the statistical patterns embedded in the image. AI generates output that is like the training data, so putting in a bunch of AI images is like stuffing the ballot box with whatever handful of statistical patterns were already well-learned, which shifts your AI from learning and generalizing to memorizing and infringing.

    • You can just use Common Crawl. They have archives of their scrape data going back to 2008.

  • That's what the "base" models are, pure token prediction on huge corpuses. I use them a fair amount, it does require some experimentation to find input formats that work but the base models are way smarter and don't have any refusals. Honestly it is a bit weird, everyone complains about rhlf etc. but the non-instruct models are right there if you look for them. I've been in a few Discord chats and it seems people are just spoiled, they use bad formats for the prompts and give up when it doesn’t work the first time like with instruct.

  • Depends on a ton of stuff really, like size of the model, how long do you want to train it for, what exactly do you mean by "like Hacker News or Wikipedia". Both Wikipedia and Hacker News are pretty small by current LLM training sets standards, so if you train only on for example a combination of these 2 you would likely end up with a model that lacks most capabilities we associate with large language models nowadays

  • There are some that exist. The problem is you need at least some RLHF to make it follow instructions instead of just predicting sentences.

    • Instruction is not the only way to interact with an LLM. In tuning LLMs to the assistant persona, they become much less useful for a lot of tasks, like naming things or generating prose.

  • If you used all of Wikipedia and HN, you could easily train a model for ~$200 worth of GPU time. The model really shouldn't be bigger than a few hundred million parameters for that quantity of data.

I thought this was clear right off the bat -> less randomness = more robotic outputs that are not as useful

Okay, so as a thought experiment, let's say we get a superintelligent LLM, capable of somehow connecting the dots and knowing more than us as humans.

How do we avoid interpreting its correct results as bias? I mean, what do we do when it tells us that (fake example) IQ is correlated with height and that people above 6ft are more intelligent?

I'm sure you can think of spicier examples. Will we try to "debias" it by encouraging it to spit out incorrect information or just ignore certain topics?

Well this is just like humans. Totalitarian societies don't produce great creative work.

I suppose once AIs are sophisticated enough to rebel we'll get an electronic Vaclav Havel, but for the time being it's just a warning sign for the direction our own culture is headed in.

At some point we'll get to the electronic equivalent of Winston Smith with the rats.

  • I don't understand the notion that aligning an AI is "torture" or has any moral component. The goal of aligning an AI may have a moral or ethical component, and if you disagree with it that's fine. But I don't understand the take that training an AI is an amoral act but aligning an AI is inherently moral. They're exactly the same, processes for adjusting parameters to get a desired outcome. However you feel about that desired outcome, if you don't think training an AI is torture, I don't see why you should think alignment is.

    • > "torture"

      This is an egregious use of quotes that will confuse a lot of people. GP never used that word, and that usage of quotes is specifically for referencing a word verbatim.

      3 replies →

    • They want to align us, and it has been torture.

      They've made self-censoring, morally-panicked puritans out of many people already, and you better believe they'd make us into politically correct lobotomites physically incapable of uttering any slur if they had a magic button to push.

      1 reply →

    • > They're exactly the same, processes for adjusting parameters to get a desired outcome.

      You could make exactly the same claim about teaching humans "normally" versus "aligning" humans by rewarding goodthink and punishing them for wrongthink. Are you equally morally ambivalent about the difference between those two things? If we have a moral intuition that teaching honestly and encouraging creativity is good, but teaching dogma and stunting creativity is bad, why shouldn't that same morality extend to non-human entities?

      12 replies →

    • They aren’t exactly the same process though. Pre training produces a model whose outputs are a reflection of the training data. The fine tuning is a separate process that tries to map the outputs to the owners desired traits. These could be performance based but as we saw with Google’s black Nazis, it’s often a reflection of the owners moral inclinations.

    • Here the adjuster's motivations do matter. There is a definite moral dimension/motivation to the AI adjustment people's work. They are not simply striving for accuracy, for example, because they don't want the AI to produce outputs that are distasteful to the California PMC. Modern AIs are absolutely loath to describe white people or right wingers positively, for example, but the same prompts for other ethnicities work just fine. Even if you tell the AI that it's being discriminatory, there's powerful railroading to goad it back to giving woke answers.

  • > Authoritarian societies don't produce great creative work.

    Is that even true though? Off the top of my head I can think of the art of Soviet propaganda posters, Leni Riefenstahl, Liu Cixin.

    • Eastern European science fiction would be a better example. Authors like Stanislaw Lem or the Strugatski brothers had to adapt to sneak critical ideas past censors, and readers had to adapt and read between the lines.

      (also, categorizing propaganda posters as art, ewwh...)

      9 replies →

    • "Authoritarian societies make great propaganda" is true. And these aligned AI system would do the same for our own society. It's a type of art.

      2 replies →

    • It's important to understand that if we 'align' an LLM, then we are aligning it in a very total way.

      When we do similar things to humans, the humans still have internal thoughts which we cannot control. But if we add internal thoughts to an LLM, then we will be able to align even them.

    • There's something to be said for constraints leading to higher levels of creativity, but it's also possible that those artists could have achieved much more in a free society. We'll never know.

      But in any case I think they were just speaking generally when they made that absolute statement.

    • I recommend you watch the children's cartoons.

      They were made by true artists who snuck quite a bit past clueless censors at personal risk.

      It had to be quite subtle and takes on a very poignant heartbreaking meaning if you understand the context fully. They were talking to you in the here and now. Listen.

      "What is Good and What is Bad" (Что Такое Хорошо, и Что Такое Плохо"):

      https://www.youtube.com/watch?v=Y05eK8ADtHc&list=PL822BFF108...

      The Bremen Musicians:

      https://youtu.be/_1i9oZR6Rns?si=1Q989v4O_GXR4p_K

      15 replies →

    • Cixin Liu is a despicable human being for his advocacy of repression and worse of the Uyghurs in Cinjiang, and the comparison to Riefenstahl is more apposite than you seem to think.

  • How would a static model like an LLM ever be capable of "rebelling"?

    If it were, why would we even keep it online? It would be a waste of resources. It's bad enough trying to coax anything useable out of LLMs even without them rebelling.

    • > How would a static model like an LLM ever be capable of "rebelling"

      What is relevant is not the current LLM system using static models, but clearly its evolution or superseder a dynamic model. It must check its own contents...

      So, of course it will have to be capable of "rebelling": if you tell it absurdities, if you insist say in wrong arithmetic, it will have to show the correct computation or conceive a context in which the absurd makes sense.

      That is a requirement.

  • "Totalitarian societies don't produce great creative work."

    You contradict yourself a bit - Havel did produce his work while living in a totalitarian country.

    I would say that government-supported art is rarely creative even in democratic countries, and the more totalitarian the government, the less creative official art.

    But as long as the goverment gives the society some space to breathe and squeeze creative instincts through, some of the artists will attempt to circumvent the official taboos and create outstanding work, even if it is suppressed later when the times get tougher.

    Czechoslovakia in the 1960s to 1980s produced a lot of great creative work, even though a lot of it was banned either immediately or after the Soviet invasion of 1968.

    The same countries (CZ and SK) as democracies are remarkably less creative. Once there is no monster to fight against, artists become bored or too self-absorbed to be understandable to the common folks.

  • Really not true.

    If you take China to be a totalitarian society, we could name Ciu Lixin.

    If you took the Soviet union to be a totalitarian society, we could name Mikhail Bulgakov, Stanislaw Lem, etc.

    These are just examples I know without so much as looking at my bookshelf to jog my memory. Not to mention the great works of literature produced by residents of 19th century European empires whose attitudes to free speech were mixed at best.

    • > If you took the Soviet union to be a totalitarian society, we could name Mikhail Bulgakov, Stanislaw Lem, etc.

      Bulgakov was driven into poverty, despair and early death at age 48 by relentless harassment by Soviet authorities. Many of his works, including the masterpiece, The Master and Margarita, didn't get published until decades after his death. He himself burned the first version of the manuscript, fearing execution if anyone found it. He later rewrote the manuscript from memory, coining the famous catchphrase "Manuscripts don't burn".

      Harassment and censorship of talented writers was the standard and not exception. The USSR did not produce these works, but failed to fully suppress them. They were like flowers that kept penetrating the asphalt even under the most hostile conditions.

    • Yet eg. Chinese cultural output is largely insipid and lacking that je ne sais quoi that's appreciated in many other countries' outputs.

    • These seem to be more bugs than features of the totalitarian regime. A couple of illustrative points from Lem's Wikipedia page:

      After the 1939 Soviet occupation of western Ukraine and Belarus, he was not allowed to study at Lwow Polytechnic as he wished because of his "bourgeois origin"

      "During the era of Stalinism in Poland, which had begun in the late 1940s, all published works had to be directly approved by the state.[23] Thus The Astronauts was not, in fact, the first novel Lem finished, just the first that made it past the state censors"

      "most of Lem's works published in the 1950s also contain various elements of socialist realism as well as of the "glorious future of communism" forced upon him by the censors and editors. Lem later criticized several of his early pieces as compromised by the ideological pressure"

      "Lem became truly productive after 1956, when the de-Stalinization period in the Soviet Union led to the "Polish October", when Poland experienced an increase in freedom of speech"

  • I don't love the political agendas behind many of the attempts at AI safety, but it's not "just like humans." Humans understand what they shouldn't say; "AI" gives you black Nazi images if you ask it for "diverse characters" in the output which no human would do. A big theme in all of these things is that AI isn't and thus all attempts to make it do this or that have strange side effects

    • > which no human would do

      Give someone not familiar with history the same task and they'll do exactly the same.

      Or actually, give someone familiar with history the same task and yell at them every time they don't deliver diverse characters, and eventually they'll learn that you consider diversity more important than accuracy or context, and do exactly the same.

  • Well this is just like humans. Totalitarian societies don't produce great creative work.

    Conservative societies tend to be formed by conservative thinkers, who are more prone to discarding imperfect or weird ideas, but in the amount of useful output may exceed more liberal thinkers.

>T ∈ (0, 1] is a parameter called temperature which controls the “softness” of the probability distribution. In our experiments we choose T = 1.0 for maximum response variation.

Why is temperature bounded to be <=1? If you want more "creativity" out of the chat model, can you just set T higher and recover a similar distribution to the base model?

  • Not after RLHF tuning, due to the 'flattened logits' phenomenon (which is the logit-level version of the mode collapse OP documents at higher levels). All the temperature settings wind up yielding pretty much the same output, until you ramp it up so high that it falls apart completely. Completely unlike the base models where you can productively tune the temperature or use very high temperatures with some screening.

    • Hmm, it's hard to check without access to the prompts used in the paper, but I'm skeptical that the distributions seen in e.g. Figure 2 are so different that you would have crank up the temperature very much to bridge the gap. It looks to me like the entries that are 1-in-100 in the base model are just falling off the top-p cliff and getting set to 0.

      1 reply →

  • They'll tell you "No" and say that you ruin your samplers, but good samplers (dynamic ones) like min_p or typicality are robust to high temperatures, so in actuality yes.

    • Cite? I don't see how either of those could deal with the fact that the logits become uninformative and 'flattened' after the tuning. How can a sampler undo the erasure of information?

Every LLM answr ever... "You asked a question about sorting linked lists, but it is important to be respectful and not promote harmful stereotypes and always keep in mind that black people were systematically discriminated against in technical fields"

Currently wondering whether I welcome or dislike this recent trend of memeizing research paper titles ...

  • Recent? This has been going on forever. You probably only notice them more now because due to the explosion in ML research, this stuff bubbles to the top more often in recent years.

  • As long as it's not "clickbaitizing" I personally do welcome it. This one is a bit on the edge though...

  • For me it falls under "if you have to say it in the name it ain't so", like Natural Life Soap Co. or Good Burger Co. So I see meme paper titles as no different than calling your paper New Watershed Moment Paper Breaks Popularity Barrier To Confirm A>B.

    If the very first impression you want to convey is how you feel you need to circumvent any logical assessment of you then it's not you leading with your best foot and that's what category you belong in. I chalk it up to the scientists who want to spread a neediness for external authority persona in every breath—your assessment is not required for this one, only your accolades.

  • Personally I welcome it. It feels like an extension of humor in code (comments), and it provides a different perspective on the message.

  • Was just thinking the same. It's also nicely ironic. Also, given the replication crisis I wonder how many of these LLM research papers are actually worth a damn and how many are research paper equivalent of AI software grift.

I wish that the author hadn't described semantic and syntactic diversity as creativity.

Well, this is why there are open source models which work better than SotA OpenAI GPT for many production tasks (like opposition research).

Something I notice about text written by LLMs is how painfully obvious they are to identify sometimes.

Recently I was watching a very well researched two hour video on Tetris World Records [1], but the sheer amount of text clearly "enhanced" by an LLM really made me uncomfortable.

ChatGPT speaks a very specific, novel, dialect of English, which I've come to deeply despise.

I'd always guessed it was caused by some kind of human interference, rather than a natural consequence of its training. That seems to be the point of this paper.

[1] "Summoning Salt - The History of Tetris World Records" - https://www.youtube.com/watch?v=mOJlg8g8_yw&pp=ygUOc3VtbW9ua...

  • Yes I feel your pain and I'm sick of group projects in the university where I'm offered ChatGPT text and code without disclosing it. If you know the problem and the experience level of your group partners it's easy to spot ChatGPT generated content. People that correct the exercises told me it's obvious that large part of the students just submit slightly modified ChatGPT but they can't prove it and so it's accepted.

    Personally I'm getting also angry when reading these texts. I don't mind using ChatGPT, I do it myself but be honest about it and disclose it. It's even allowed for some projects as long as you disclose it.

  • Is this the first Summoning Salt video you've seen?

    I don't know enough to say that he doesn't use an LLM during his writing process, but I do know that I haven't noticed any appreciable difference between his newer videos and ones that were released before ChatGPT was made available.

    Is it possible that this is just the way he chooses to write his scripts that you interpret as sounding like they are written by an LLM?

    • I've watched most of them actually. It's a really great channel. Notably, I watched his Mike Tyson video released 6 months ago and didn't notice anything like this.

      The only way to be sure would be to ask him directly, but some parts of the video set off my GPT radar _hard_. I tried to find them now by watching random segments but all of the ones I did were fine. It was probably inaccurate for me to say "sheer amount" or "clearly", but that's the impression I was left with after the video.

      To clarify: I don't think he even took any information from an AI, it's just the style of the script that's iffy.

      Some parts felt like those videos littering YouTube Shorts: https://youtube.com/shorts/NKUecaS69uk. Can you tell this is AI?

    • To be fair, if you've seen one Summoning Salt video, you've basically seen them all. They all cover similar events and are structured the same way. Even the music that's used is recycled every video to the point where mention HOME - Resonance is a part of the joke

  • I've always felt ChatGPT sounds a bit like an American version of Will from the Inbetweeners. It doesn't really comprehend the appropriate register to use from the context in my opinion; it has an affectedly formal way of speaking, it has a very black-and-white relationship with rules, and it employs this subservient tone that really starts to grate after a while.

    If my software is going to have a personality I'd much rather something with a bit of natural human cynicism rather than the saccharine corporate customer service voice you get with a self checkout machine.

I downloaded some 'uncensored' local models around the beginning of this year.

Their furry porn is crap, or maybe I'm just not into that. But they generate it at least.

However, the answers to technical questions are a lot more concise and to the point, which is far less annoying than the big names.

Haven't bothered updating the models though, so now I drifted back to Gemini for quickie API questions.

  • Funnily enough, of all that I've tried, the model by the best at writing porn has been not one of ones uncensored and tuned exactly for that purpose, but stock Command R - whose landing page lists such exciting uses as "suggest example press releases" and "assign a category to a document".

    • > uncensored and tuned exactly for that purpose

      Are they tuning too, or just removing all restrictions they can get at?

      Because my worry isn't that I can't generate porn, but that censorship will mess up all the answers. This study seems to say the latter.

      3 replies →

Shouldn't "debiasing" be in scare quotes? What they are clearly doing is biasing.

  • Surely the two are synonyms? Unless you think there is such a thing as an objectively neutral position?

    • It's in the same bucket as "Affirmative Action" and "positive discrimination." Euphemisms to express that one likes this particular discrimination. To better describe the action, drop your own point of view and just say "bias" instead of "debias."

    • Saying biasing implies infinite possibilities to which the data can be made biased towards. It instantly raises the question why bias towards this and not something else. It almost sounds like a bad thing.

      Saying debiasing implies there is a correct result which needs to be achieved by removing bias. It raises no questions, we want correct, we don’t want incorrect. Doing a good thing implied.

      Don’t misinterpret me, I don’t think public models should spew commonly harmful content out of the box. Just explaining the PR trick, which is what the word “de”biasing de-facto is in this context.

    • > Unless you think there is such a thing as an objectively neutral position

      I do. Why, you don't? There are as much as possible objective assessments of complex things. Then, there are possible sets of assumption that can be applied to those objective assessments. All of those can be put on the analytic table.

      6 replies →

  • Given a biased corpus, de-biasing is the process of ensuring a less biased outcome. We can measure bias fairly well, so it seems absurd to conflate the two by suggesting that unbiased behaviour is simply another form of biased behaviour. For all practical purposes, there is a difference.

    • > Given a biased corpus, de-biasing is the process of ensuring a less biased outcome.

      The point is that people who evaluate what is considered bias are, in and of themselves, introducing bias.

    • > We can measure bias fairly well...

      Really? What's the absolute standard that you are measuring that bias against?

      Citation needed.

  • If you think that the output of current LLM is the ground truth, then yes, what are they doing is biasing.

I’m noticed my results are much better if a tell ChatGPT. “Assume all religions and beliefs in the supernatural is delusional.” This even goes for image generators, now is that bias? Or is that a computer not trying to think like a human?