Comment by wenbin

11 days ago

In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated (e.g., through tools like Notebook LM).

However, I'm uncertain whether audiences will truly enjoy this AI-generated content. Personally, I prefer content created by humans—it feels more authentic and engaging to me.

It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content. Unfortunately, current detection tools for AI-generated audio are still lacking - https://www.npr.org/2024/04/05/1241446778/deepfake-audio-det...

[Edit] We just put together a list of notebooklm generated "podcasts": https://github.com/ListenNotes/notebooklm-generated-fake-pod...

Consider whether you'd enjoy listening to AI-generated podcasts. I believe people might be okay with shows they create themselves, but are less likely to appreciate 'podcasts' ai-generated by others.

>Personally, I prefer content created by humans—it feels more authentic and engaging to me.

I'd like to think that too, but I wonder how long - if at all - this will be true. I "want" to like human generated content more, but I suspect AI may be able to optimize for human engagement more, especially for simple dopamine inducing content (like tiktok videos). After all, we're less complicated than we like to think.

>It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content.

This will never work, unfortunately. There's no way to exclude rogue actors, and there's plenty of profit in AIs pretending to be human. If anything, we will have to watermark/sign human generated content.

> In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated

It's not helpful that you're making a binary distinction here.

As an example, as much as 10 years ago, I would find Youtube videos where the narration was entirely TTS. The creators didn't want to use their own voice, and so they wrote the script, and fed it into a TTS system. As you can expect from the state of the art at the time, it sounded terrible. Yet people enjoyed the videos and they had high view counts.

Are we calling this AI-generated?

We now have better TTS (without generative AI). Way better. I presume those types of videos are now better for me to watch. You may still be able to tell it's not a human because the tone doesn't have much variance. You'd probably have to listen for a minute or longer to discern that, though.

Are we calling this AI-generated?

Now with generative AI, we have voices that perhaps you won't be able to identify as AI. But it's all good as long as a human wrote the script, right?

Are we calling this AI-generated?

Finally, take the same video. The creator writes the script, but feels he's not a good writer (or English is not his native tongue, and he likely has lots of grammatical errors). So he passes his script to GPT and asks it to rewrite it - and not just fix grammatical errors but have it improve it, with some end goal in mind ("This will be the script for a popular video...") He then reviews that the essence he was trying to capture was conveyed, and goes ahead with the voice generation.

Is this AI-generated?

To me, all of these are fine, and not in any way inferior to one with a completely human workflow. As long as the creator is a human, and he feels it is conveying what he needed to convey.

I would love to take a first draft of a blog post, send it to GPT, and have it write it for me. The reason I don't is that so far, whatever it produces doesn't have my "voice". It may capture what I meant to say, but the writing style is completely different from mine. If I could get GPT/Claude to mimic my style more, I'd absolutely run with it. Almost no one likes endless editing - especially writers!

Question is how long till you can’t tell the difference

  • My FAANG working spouse thinks that AIs and Robocallers should be mandated to identify themselves. She thinks a audible "Beep-boop" at the end of a sentence for calls and video would be appropriate.

    • I support that idea. Along with properly implemented authentication so you can't just spoof your way to someone's phone, and painfully stiff fines for violators.

  • It's almost impossible now. NotebookLM really impressed me. I knew voice synthesis has gotten better than Stephen Hawking's "voice" but I really wasn't expecting having two realistic voices with emotions that even banter with each other. There is a bit of banality to them - they like to call something a "a game changer" practically every "podcast" and the insights into the material is pretty shallow, but they are probably better than the average podcaster already.

    • It's impressive at first until you realise they're practically ad libbing a script. They're filled with all the same annoying American clichés ("you know me, I like x", your aforementioned "a game changer", plenty of "wow"). It would be impossible to listen to two in a row without realising how repetitive it is.

At Listen Notes, we recently removed over 500 fake podcasts generated by Notebook LM in just the past weekend.

It's disappointing to see scammers and black-hat SEOs already leveraging Notebook LM to mass-produce fake podcasts and distribute them across various platforms.