← Back to context

Comment by vessenes

12 days ago

Substantive effort by Dan, as usual.

He asks for a prediction — will this still be the same state of affairs in 2033, where “same” means models encode source data bias and we don’t have in-model ways of dealing with that bias. I’d predict “yes” on that, with some caveats.

What practitioners seem to be doing now is using prompting modifications to inputs to get desired diversity spreads out of the models. I’ve written a bit about this, but I think doing this openly, with user choice, is great, and doing it secretly, without notification is evil. I think a lot of people feel this way, and it explains much of the outcry against race-shifting founding fathers.

Whatever I think about it, we’ll see a lot of that by 2033. I do think we’ll see much more sophisticated use of controlnets / LoRAs / their successors to nudge / adjust inference at the weight level, vs. prompting. These are useful right now, and super sophisticated, they’re not just for bias-related changes, almost anything you can prompt up could become a vector you adjust LLM behavior on. So, I think we’ll move out of the Stone Age and into say the Bronze Age by 2033.

That said, Dan does make a fundamental input bias error here, which is common to do when people explore and write about diffusion models, but really important to test — what does the source input image randomness look like? A diffusion model moves some number of steps away from some input. Typically random noise. This random noise has a lightness level, and also at times color tone. By default, in most inference systems, this image is on average tone-neutral (grey) and very light.

If you’re going to generate sample images without keeping track of seeds, fine, do a bunch, like he did here. But, if you’re going to determine how likely ‘whiteness’ is on a given prompt, you need to be very aware of what source image you’re giving the model to work on. Especially when we are talking about facial and race discriminators that are judged largely on skin tone. A white input image is easier to turn into a white face, and requires less deviation from the original image, and so on average, it will be preferred by most diffusion model generation stacks.

So, is PAI or Stable Diffusion biased over and above the world’s image data levels in their model? Maybe, I don’t know. Is it biased at the world’s image data levels? Maybe, probably? But, I don’t think you can pass a Gaussian noise image defined to have a fairly white lightness value and grey color tone to a thing, ask it to draw a face, and then say it’s white-face-biased a priori — you’re starting the model out with a box of very light crayons and making it ask for other colors from the cabinet vs using what’s at hand.

Anyway, I don’t think this takes away from Dan’s fundamental point that this class of ‘bug’ is not going away, especially in that it’s harder to even agree on what is a bug. But, I’d like to see someone, anyone, talk about image generation bias while aware of what’s being fed into these models at the start of inference, it would raise the level of discourse.