Comment by kevin_thibedeau

11 days ago

Almost all of the "product X vs Y" results are AI ramblings now. This growth of the dead Internet is making me want to sign up for Kagi. We're going to need a certification for human generated content at some point.

Kagi is not a panacea unfortunately. I pay for it and daily drive it to support a Google alternative, but I still have real trouble with my results being full of AI garbage (both image and text search).

As mentioned, product comparisons are a big one but another worrying area is anything medical related.

I was trying to find research about a medicine I'm taking this week and the already SEO infested results of 5 years ago have become immeasurably worse, with 100s of pages of GPT generated spam trying to attract your click.

I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.

  • That matches my experience. Kagi doesn't surface much content beyond what Google/Bing do. What it does better out of the box is guessing which content is low-quality and displaying so that it takes up less space, allowing you to see a few more pages worth of search results on the first page. And then it lets you permanently filter out sites you consider to be low quality so you don't see them at all. That would have been awesome 10 years ago when search spam was dominated by a few dozen sites per subject that mastered SEO (say expertsexchange), but it is less useful now that there are millions of AI content mills drowning out the real content.

    For content that isn't time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I've setup a Firefox keyword searches[1] for this, and find myself using them for the majority of my searches, and only use normal search for subjects where the information must be from the last few years. It does penalize "evergreen" pages where sites continuously make minor changes to pages to bump their SEO, which sucks for some old articles at contemporary sites, but for the most part gives much better results.

    [1] For example: https://www.google.com/search?q=%s&source=lnt&tbs=cdr%3A1%2C...

    • > For content that isn't time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I've setup a Firefox keyword searches[1] for this, and find myself using them for the majority of my searches...

      OMG. I'm so happy how much AI is improving our lives right now. It really is the future, and that future is bright.

      Thanks guys!

  • I use Kagi personally every day and my results are definitely not full of AI garbage so would like to better understand your context.

    Have you reported any of those issues to Kagi (support/discord/user forum)? We are pretty good at dealing with search quality issues.

  • The UK NHS website is usually pretty good for this so sticking "NHS" in the search terms might help, although I imagine they may not cover non-UK brand names.

  • > I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.

    I've been doing this for years now. The normienet as I call it is nigh worthless, and I don't even bother trying to find information on it.

  • I also use it daily. One of my favorite functions is being able to boost certain domains and block or downgrade results from other domains. So I boost results from domains I trust which significantly improves my results. They have a page with commonly boosted/blocked/downgraded sites which serves as a good starting point.

It really is a werid feeling remembering the internet of my youth and even my 20s and knowing that it will never exist again.

  • I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

    I'm ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I'm tired of the advertising 'net.

    • I've said it before and say it again, I firmly think AOL was just ahead of its time.

      Bring it back. Charge me 10 or 20 a month. Give me the walled off chatrooms, forums, IM, articles, keywords, search, etc. Revamp it, make it modern. And make a mobile app.

      Everyone wanted a free and open Internet, until AI and the bots ruined it all.

      10 replies →

    • It still exists. Currently it looks like Patreons and their associated communities, long-running web forums, small chatrooms on platforms like Discord or Facebook or Instagram, and so on. Small communities, with relatively high barriers to entry.

      7 replies →

    • Would you pay a nominal amount (like 5 cents or 25 cents) to consume one piece of good, ad-free content, assuming that there was no login, no account, no friction, etc? You click, you read, and 5 cents is magically transferred from you to the writer?

      I would. But I've asked a lot of people who say "no, I don't want to pay when I can read it for free. I don't mind the ads that much."

    • > I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century.

      I'm a little sad for anyone who didn't get to experience of pre-Internet era.

      Internet is lead of our time.

    • > where the incentives are aligned towards me, instead of against me.

      It's great to read these words. People are starting to get it. The Internet is not for you, it's against you.

      4 replies →

    • > I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

      I did, and...well, let's be careful how we look back at it.

      Punch the monkey? Ad supported 'free' internet that literally put an adbar at the top of your browser at all times? Dreadfully slow loads of someone's animated construction sign GIF? Waiting for dial up to connect after 20 tries? Tracking super pixels? Java web applets? Flash? Watching your favorite ISP implode or get bought up? To say nothing of the pre-Google search results (I miss the categories though).

      I have plenty of good memories from those days, but it still had plenty of problems. And it wasn't exactly a bastion of research material either unless you really went digging or paid for access.

    • the problem with that is not the payment, it's that you will only be sharing it with people similarly willing to pay for a walled garden. I'm guessing most of what we're nostalgic for was created by people who wouldn't be up for that

      2 replies →

    • > I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

      Sadly, they won't know what they were missing. It'll be the new normal

      Some asshole tech apologist is probably getting ready to post that section from Plato where Socrates complains about writing any minute now.

      Of course, that asshole is oblivious to the fact that most if not all of us probably just don't understand what Socrates was missing, so he's just showing his ignorance and stupidity.

      > I'm ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I'm tired of the advertising 'net.

      The problem is that, even if you try to do that, the incentives are probably still aligned against you, just maybe less blatantly.

      Just look at how many formerly ad-free paid services are adding ads, and how hardware users literally own acts against their interests by pushing ads in their faces (e.g. smart TVs).

      The guy who runs the walled garden will always be tempted to get some extra cash by adding ad revenue to your subscription feed, or cut costs by replacing human curated stuff with AI slop (maybe cleaned up a bit).

  • I only just put it together but Peter Watt's Rifters series is some epic earth grimdark hard-sci-fi, the first taking place as practically horror, confined deep under water.

    But my point is, the latter books have this has amazing post-internet, just a ravaged chaotic Wildlands filled with rabid programs & wild viruses. Packets staggering half intact across the virtualscape, hit by digital storms. Our internet isn't quite so amazing, but I see the relationship more subtly with where we have gone, with so so so many generated sites happy to regurgitate information poorly at you or to sell you a slant quietly. Bereft of real sites, real traffic. Watts is a master writer. Maelstrom.

    First book Starfish is free. https://www.rifters.com/real/STARFISH.htm

  • > It really is a werid feeling remembering the internet of my youth and even my 20s and knowing that it will never exist again.

    User facing ability to whitelist and blacklist websites in search results, ability to set weights for websites you want to see higher in search results.

    Spamlists for search results, so even if you don't have knowledge/experience to do it yourself, you can still protect them from spam.

    It's recreation of e-mail situation, not because it's good, but because www is getting even worse than e-mail.

  • A mesh network on top of IP with an enforcable license agreement that prohibits all commercial use would suffice to get the old net back. Bonus points if no html/css/js is involved but some sane display technology instead.

    • No way. What you are describing is Gemini, but even more niche - a place which is explicitly walled-off from the "big net", which only nostalgic people with right technical skills and a desire to jump some hoops can get to.

      This is not going to work - as time progresses, there will be less and less nostalgic people who are willing to put up with that complexity. And "non-commercial" part will ensure that there _never_ be an option to say: "I am tired of fixing my homeserver once again, I am going to put up my site to (github|sourceforge|$1 hosting) and forget about it".

      Compare to early web. First thing that came to my mind was Bowden's Hobby Circuits site [0]. It's designed for advanced beginners - simple projects, nice explanations. And there are no hoops to jump through - I've personally sent the links to it to many people via forums, private emails, and so on. It apparently went down in 2023, but while it was still up, I remember regularly finding it from google searches and via links from other pages.

      [0] https://web.archive.org/web/20220429084959/http://www.bowden...

> This growth of the dead internet

It is quite surreal to witness. It is certainly fueled by the commercialization of internet due to ads and centralization to user hostile platforms.

The old internet seems to be doing much better. But it lost most of its users in the last 15 years...

  • The old internet seems to be doing much better. But it lost most of its users in the last 15 years..

    What do you mean by this? How do you find the old internet?

    • You don't find them easely. That is the point I guess. But I am not reffering to some obscure darkweb here.

      Many of e.g. the old niche forums still exist. Like, FOSS sites. GNU project sites seemed not to have aged a day in 20 years, i.e. still party like its 2004.

      Also, I think non English sites are better off since Reddit mainly ate English communities and sites.

      Facebook is probably what killed most of the living internet. Small community sites. Like the local Kennel club or Boat marina.

      A good example of the old internet would be Matthew's Volvo site:

      https://www.matthewsvolvosite.com/forums/search.php?search_i...

    • You're on it right now. HN is a very old site with old users and old mods that links to other old sites

Searching with "Reddit" at the end of every query helps but I suppose it's only a matter of time when most content on Reddit is also AI-generated.

  • Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.

    • > Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.

      The whole point of the API access change was to charge AI model-makers. I'd be ironic if the API change made destroyed their product and made their data unsellable.

      2 replies →

  • If you know anyone who works in marketing/PR, ask them how they use Reddit. That has been gamified as much as SEO since about 2020. I’m assuming, anything except “why is there a fire in this street?” kind of posts are just ads at this point.

  • It's also not much use to anyone who doesn't use Google ever since Reddit started blocking all crawlers besides Googlebot. Old cached results might still show up in Bing/DDG/Kagi but they can't index any of the newer stuff.

  • Most of the Reddit content is now[0] fake.

    [0] Gradually for several years already.

    • The niche subreddit's I follow seem to be ok. I stay away from the big ones. All the default ones are garbage from what I can see.

Kagi's results for "baby peacock" are showing almost the same set (Mostly AI) as Google's.

  • It's surprising how many times you see this pattern on HN

    "Google sucks!"(50 upvotes)

    "That's why I use Kagi!"(45 upvotes)

    "Actually Kagi has the exact same problem and you have to pay for it."(2 upvotes)

  • Search “peachick”, it works fine. I assume Google would be the same.

    I guess using the correct terminology matters.

    • > I guess using the correct terminology matters.

      If people were actually searching up "peachick" that'd probably be SEO spammed to hell, too.

Unfortunately, as much as I do like Kagi overall, it goes out of its way to inject AI slop into the results with its sketchy summarization feature

Most product reviews are simply pumping amazon comments into AI to generate a review. with a final "pros/cons" section that is basically the same summary amazon AI generates.

Whether something is human generated is (mostly) beside the point. The problem is that spam is incentivized today. Any solution must directly attack the financial incentive to spam. Therefore what's needed for a start is for search engines to heavily downweight ads, trackers, and affiliate links (obviously search engines run by ad companies will not do this). Shilling (e.g. on reddit) should be handled as criminal fraud.

> We're going to need a certification for human generated content at some point.

People keep saying this and I keep warning them to be careful what they wish for. The most likely outcome is that "certification of human generated content" arrives in the form of remote attestation where you can't get on the internet unless you're on a device with a cryptographically sealed boot chain that prevents any untrusted code from running and also has your camera on to make sure you're a human. It won't be required by law, but no real websites will let you sign in without it, and any sites that don't use it will be overrun with junk.

I hate this future but it's looking increasingly inevitable.

  • There's ways to do this without destroying anonymity. Ideally, you verify you're human by signing up for some centralized service in real-life, maybe at the post office or something. And then people can ask this service if you're real by providing your super-long rotating token. So, just like an existing IDP but big.

Even Google is trying to get into the X vs Y game, with pretty funny results if you ask for a nonsensical comparison.

https://x.com/samhenrigold/status/1843040235325964549

...or a sensical comparison where it just completely misses the point.

https://i.imgur.com/FotFZ3F.jpeg

  • Couldn't reproduce - in fact, the second hit is a threads version of the same post - but I get no AI suggestions for this query. Humorous Google queries (or AI queries more generally) are definitely a trope, so I can never really tell if they actually happened or if it's all for karma.

    • Google also routinely removes AI suggestions for searches that produce embarrassing results (you don't get them for searches about keeping cheese on your pizza anymore, for example), so it's even harder to validate once a result goes viral.

      1 reply →

    • I still get the second one when I search "Difference between sauce and dressing" on Google. The Oven vs Ottoman empire one I don't get an AI overview.

      Edit: Similar to the second one I just did Panda Bear vs Australia which informed me "Australians value authenticity, sincerity, and modesty. Giant pandas are solitary and peaceful, but will fight back if escape is impossible. "

I'm glad that Kagi (and others) exist as an alternative for people who don't want generative AI in their searches.

Personally, I'm excited about more generative AI being added to my search results, and I'll probably switch to whichever search engine ends up with the best version of it.

  • This peacock thing was the last straw for me. I installed Kagi just moments ago.

    And of course the first image for "baby peacock" is the same white chick thing… obviously because this story is making the rounds —_—

  • AI tools on the search page: sure, cool. I use perplexity a lot, actually. I'm in favor of this.

    Search results that are full of content mills serving pre-genned content: no thanks. It's in the same category as those fake stackoverflow scrape sites.

  • Not sure if you’re being sarcastic, but they’re not talking about AI features of the search engine itself (Kagi has those too), but about nonsensical AI generated content on the web that exist solely for the purpose of getting you clicking on some ads. Kagi tries to make those sites stand out less on the search results.

human-verified content is going to be the next billion dollar company.

  • Perhaps you're thinking of the Wikimedia Foundation.

    There is plenty of space there for more volunteer editors to verify content, and likewise, WMF operates its own cloud platform where developers are automating tools that do maintenance and transformation on the human-contributed content.

    Then, there is Wikidata, a machine-readable Wiki. Many other projects draw data from here, so that it can be localized and presented appropriately. Yet, its UI and SPARQL language are accessible to ordinary users, so have fun verifying the content there, too!

    • I don't think you understand what I meant by human verified, but I used a very vague term to express what I meant, I meant proving that some input or data that comes from a user was generated by a human (whatever we define that to mean) rather than an LLM or multi-modal image/video/audio model output.

      1 reply →

  • This issue in terms of cost is that if you want this to be truly human-verified for real, you're gonna have to dip into the real world.

product X vs Y are not really any worse now than pre-GPT (i.e. they were absolute crap long before GPT came on to the scene).

I'm not sure human-generated content is any better on the whole. BS-laden drivel has been pervasive for some time now, even before AI started taking over.

I'm talking about those 300-word, ad-ridden crap articles that are SEO'd right to the top, and if you're lucky you might get the 3-word answer you were looking for: "<300 words of shit>... and in conclusion, <1-step answer>.". Anyway, humans have been getting paid pennies to write those for a while.

AI just turns the throughput on that up to 11, where there's just no end in sight. I think this is like the primary failure mode of AI at this point. It's not going to kill us - we're going to use it to kill the internet. OTOH, maybe then we just go outside and play.

  • In the world of content moderation, we refer to this as constructive friction. if you make it too easy to do a thing, the quality of that thing goes down. Difficulty forces people to actually think about what they are writing, whether it is germaine and accurate. So generative AI, as you point out, removes all the friction, and you end up with bland soup.

Ironically, ChatGPT and similar LLM chatbots are great for those kinds of searches.

  • You would have to be soft in the head to rely on any LLM for researching information on a medication you're actively taking.

Before AI, product comparison sites were ramblings of interns paid by people who found out you could make money from SEO-optimized blogs.

And long before the Internet, people slapped random concoctions together and sold them as medicine, advertising them as cure-alls.

Any source of content can be controlled or manipulated in non-obvious ways. And we already have strong algorithms for manipulating human attention (resulting in the growth of non-falsifiable conspiracy theories, for one). There is no clear approach leading out of information dystopia.