← Back to context

Comment by wenbin

10 hours ago

NotebookLM is contributing to fake podcasts across the internet, with over 1,300 and counting:

https://news.ycombinator.com/item?id=41767648

It won't be long before we see a similar trend with low-quality, AI-generated fake podcasts flooding the internet.

Where do you get the "low-quality" part from - my experience with NotebookLM is that they create much higher quality, more informative, more fact based, and more concise podcasts than 99% of the stuff I listen to. I've mostly switched entirely over to NotebookLM for my podcast listening. They, generally, offer a far higher quality experience from my perspective.

Maybe you have the problem backwards - we accidentally end up listening to non NotebookLM podcasts?

  • A coworker fed some EU trade regulation page and its official FAQ to NotebookLM, and I was quite impressed with the results.

    It was factually accurate, and presented the topic in a manner that was easy to digest and kept it interesting.

    I didn't plan to but ended up listening to the whole thing, and I normally don't enjoy the podcast format.

    For someone new to the topic, it'd be a pretty great intro compared to reading the official pages.

  • It's interesting assumption that by virtue of being AI generated, it's considered bad/fake. 20 years ago, people hated how photoshop changed the photo design industry, NotebookLM is knocking on the door now.

    • I'm excited by AI, but I've also tried using this specific one to generate a podcast based on one of my own blog posts and will only try again due to this product announcement rather than because I think the state of the art is already "there".

      On the plus side, the speech is almost perfect; so good, that I sincerely hope the voices themselves are never fully under user control.

      With regards to the actual summary of the content I gave them, I would say they are grade B: only mostly correct, they're still inventing things I didn't say and missing things I did say.

      That's not to say humans don't make mistakes, I still consider this objectively impressive, that is able to reach even this level was SciFi when I was a kid — but why waste time on a grade-B podcast when the AAA-tier costs you as a consumer a 30 second advert?

  • Personally, I hate even the idea of an AI made podcast, because to me podcasts are personal and emotional. They're about the individual humans who make them. They're not just a source of "information".

    • I'm glad there are different kinds of podcasts for different people now.

      I've always absolutely hated the focus on the individual humans and their personalities behind the podcast, and wished they'd be a better source of well-structured "information".

      I never listened to a podcast I didn't get frustrated with, even at 2x speed. These NotebookLM podcasts have been exactly what I've always wished podcasts were.

    • Have you listened to any audio overviews in NotebookLM? They can be surprisingly good.

  • Okay, I will bite.

    Its trained on too many shallow podcasts. Go compare any of NotebookLM podcast with an episode of Hardcore History. The latter goes into much more depth (even when you account of it being much longer).

  • Interesting, are there any podcasts in particular that you recommend? Everything I’ve heard from it just seems like the most banal, cookie cutter stereotype of a podcast with nothing but extremely surface level summarization of a given article, peppered with random cliches and fake sounding reactions “Wow! ok, so let’s hear more about that. I’m intrigued!” “OK, let’s dive deep.” Etc.

Assuming Google retain the current two voices for the audio overviews, it will rapidly become obvious to most people where the podcast came from. I've seen "creators" on YouTube running NotebookLM-generated audio through (e.g.) ElevenLabs to change the voices, but this invariably degrades the quality.

This doesn't strike me as much of a problem as it appears for you. What are the biggest issues you foresee?

I'm an avid podcast listener, but I already ignore 99.9% of podcasts out there. I'm not concerned that this is going to become 99.99%.

If these AI generated podcasts are all bad, I will just continue to ignore them. If some turn out to be good, it seems like a win to me.

If you're worried about an existential "what happens to the world if all media is machine generated", I guess I'm willing to hop on the ride and see what we find out.

  • 99.9? There are roughly 3mm podcasts out there right now - I listen, regularly, to about 10 over a year (in any given week maybe 3-4). I'm therefore ignoring 2,999,990 or 99.9997% of podcast. I definitely agree with you that this isn't a problem.

    (Also - ironically, one of the podcast out of those 10 that I listen to regularly - it's the Deep Dive on AI. A NotebookLM production! )

    • It could poison the well - make it hard for people to find new good podcasts, and reduce discovery and revenue. Also they could fragment our society even more, disconnect people from people. Doesn't seem worth the risk.

      If people want to listen to AI generated podcasts, they can just make them themselves. They don't need publishing on a platform alongside human-made podcasts. If I was Apple, who ultimately control curation of podcasts, then I'd prevent them. After all, Apple Intelligence will soon do as good a job of making your custom podcast if that's what you want.

      1 reply →

This is like saying: “Text based LLMs should do more to stop people from publishing the results of what they produce”

NotebookLM seems wonderful for digesting various content in an alternative way. It’s not a “fake podcast” either.

Nobody is saying that the audio output should or should not be published somewhere. That’s a user decision for both publishing and subscribing.

Indexes and discovery on the internet is where you advocate policing instead of nit picking a useful tool.

It sounds more like we should ban email and all email providers should consider the problem of email spam which traditional mail didn't have because no one could afford that many envelops and stamps.

Or like we should go back to carts because cars are noisy and not only that but might collide with pedestrians and not only that, might even collide among each other.

Instead of containing the tools and curtailing the progress (email and cars) we should probably try to contain and curtail abusers. Very hard to do, I know but the right thing to do.

> it also opens the door for spammers to mass-produce content that isn't meant for human consumption.

What's new? Every novel class of genAI product has brought a tidal wave of slop, spam and/or scams to the medium it generates. If anyone working on a product like this doesn't anticipate it being used to mass produce vapid white-noise "content" on an industrial scale then they haven't been paying attention.

  • This is definitely not a new issue.

    What I’m aiming for is to ensure that the NotebookLM team is aware of the impact and actively considering it. Hopefully, they are already working on tools or mechanisms to address the problem—ideally before their colleagues at YouTube and Google Search come asking for help to fight NotebookLM-generated spams :)

    It's certainly easier for the creators of genAI to build detection tools than for outsiders to do so. AI audio detection is a hard problem - https://www.npr.org/2024/04/05/1241446778/deepfake-audio-det...

    • > What I’m aiming for is to ensure that the NotebookLM team is aware of the impact and actively considering it.

      What is the impact? Have any of them attracted an audience of any meaningful size? If a month from now there are 1.3 million generated podcasts, what do you anticipate the fallout to be?

      3 replies →

Podcasts - episodic radio shows hosted on Apple Music and Spotify - haven't been around for very long. Not long enough to have kids being tutored in making podcasts and then becoming adults with that sentimental hobby, like with playing violin or oil painting. If you believe that the "Human Authenticity Badge" is meaningful for podcasts, it's complicated: traditions play the biggest role in the outrage you are trying to spin, not an appeal to slop and spam, which of course, there is already a ton of low quality podcasts, music and art written by real people for no nefarious purpose whatsoever. Like with many of these posts, which are really common on HN, there isn't a sensible remedy suggested besides pointing the fingers at some giant corporation, and asking them to do something impossible.

If you care a lot about podcast quality, go and make your own podcast service with better discovery. Once you realize the antagonist was collaborative filtering, made possible by non-negative matrix factorization dating from the year 2000, and not AI, you will at least have learned something from the comment, instead of just feeling better. And then, how do you propose to curate by hand, and why would someone choose your curation over the New Yorker's? And maybe those very purists, trying to make everything sentimental, accusing everyone of slop and spam - well, why do so many creators thrive and ignore the New Yorker's opinion about them entirely? Perhaps curation is not only not scalable, but also wrong. Difficult questions for listeners and podcast authors alike.

Only 1300? I imagine it would be soo many more.

  • It’s definitely more than that.

    The 1,300+ shows are just the ones recently removed from Listen Notes.

    Give it a few days, and I’m sure the number will double, quadruple, and continue to grow. :(

Counterpoint: Most podcasts were utterly worthless before AI too. The world will do fine losing a few mattress ad vehicles.

Like other data, provenance suddenly matters a lot. From my POV, that's good. Not all data sources are created equal, and this is putting it into stark enough relief it might actually change the landscape. (In case it isn't obvious, I strongly believe most of the Internet was garbage well before LLMs. We just called it "SEO". Still garbage)

  • I generally agree, but when AI generated content is actively trying to avoid being labelled as “AI generated” it kinda gets depressing. Because in the end, it will just make the entire industry “seem” worthless, akin to AI generated pictures.

    I’d rather let the end user know if it was made by humans or not, and let the marker decide. If people love listening to such content, let it be. But hiding how it was made, feels a bit disingenuous.

So what do you propose Google do to prevent this from happening?

  • The comments' default remedy is tribal: "The only moral content is my content." We sort of used to live in that world under the studio and TV networks system. Most consumers would say, it was not so bad, maybe better even.

    Of course, the commenter never says this, living in the world today, where the writing he likes would never be published by the New York Times like it is on Twitter, the TV he likes would never be offered for free like it is on YouTube, and the music he likes would never been offered for pennies on Spotify. Some meaningful creators will lose from every remedy you could think of, where Google "something somethings" AI. Maybe the root problem is generalizing.

    • I created a “podcast episode” (???) of my personal blog (not trying to get traffic to it. It’s more of a journal) using NotebookLM. It sounded just as bland and overproduced as a “professional” podcast by NPR like “Planet Money” and “The Indicator”.

      Whether that is saying how high quality NotebookLM is or how low quality NPRs podcast are is an exercise for the reader.

      The only reason “Stuff you should know” is better is because of the random off topic discussions they go into and that’s not a complaint about SYSK.

> Is there a watermark or any other identifiable marker that can be used?

The problem with this is it's not feasible long-term, or even medium-term - as soon as a watermarking system is implemented, a watermark-removal system will be created.

(Happy to be proven wrong)

Well which one is it? Are the podcasts low quality or not? If they are, what the hell are you worried about? To be worried about, idk, disinformation from podcasts of all things is absolutely silliness. Won't someone think of the... podcast audiences? Fuckin what dude?