Comment by darkteflon

7 hours ago

Funny wrinkle here: unless I’ve misread the OpenAI API docs[1], the recently added prompt caching feature cannot be explicitly disabled and automatically applies to all input prompts over 1024 tokens for a ~few minutes.

It seems to be possible to work around it by mixing up the very start of your prompt (e.g., with an iteration number), but it’s messed up some of our workflows which rely on running multiple hits with the same prompt to gather a consensus output.