Tell NYT, Atlantic, USA Today to Keep Wayback Machine

(savethearchive.com)

94 points | by doener 2 hours ago

8 comments

  • ctippett 27 minutes ago
    Am I correct that this has come about because archive.org respects robots.txt and these sites have blocked their crawler from indexing their sites?

    I'm not sure how to articulate my thoughts on this exactly, other than to say it's disappointing that doing the right thing (i.e. respecting robots.txt) is rewarded with the burden of soliciting responses to a petition while at the same time others are rewarded with profit for ignoring those same directives.

    • cmeacham98 10 minutes ago
      Correct. Example snippet from the nytimes.com robots.txt:

          User-agent: archive.org_bot
          Disallow: /
    • Paracompact 8 minutes ago
      Don't know if it helps your musings at all, but there's a good chance that if a high-profile crawler like archive.org disrespected their robots.txt, that archive.org would be faced with lawsuits (or some other form of pressure). This is not merely the most moral move; rather it is the only sensible move.

      The only reason "others are rewarded with profit" in cases like these are because pinkie-promise-style obligations don't affect players too small or shadowy to bother litigating.

  • someperson 1 hour ago
    Maybe they should have an escrow like Financial Times is available on NewsBank service with a 30 day escrow
  • sublinear 1 minute ago
    After many years of these media outlets circling the drain, this is likely the clearest signal of their irrelevance. It's not like anyone is committing these rags to microfiche anymore.
  • JumpCrisscross 39 minutes ago
    I know a little about this debate on the Times and Atlantic sides. I’ll get some grief for this, but I asked a senior person at the former what they thought about the paywall workarounds that are frequent on HN—I was genuinely shocked to learn they hadn’t heard about it.

    In the end, we settled on agreeing that making such stuff available after 30 days, and possibly with access restrictions (can’t be pulled more than N times a day, in case it becomes relevant in the future) struck the right balance.

    To my knowledge, the Internet Archive hasn’t done any outreach on this issue. In addition to pressuring the publications, I’d put some pressure on them to negotiate.

  • LNSY 42 minutes ago
    [flagged]
  • righthand 47 minutes ago
    Wouldn’t it be better to let these legacy news orgs (which aren’t really anything beyond advertising and data harvesting firms) block archive.org and thus no one will read their articles and they can go under? I’m struggling to think of a reason I need NY Times. I’ve never had a subscription and never seen writing that I thought benefited me as a citizen (they’re Very pro-war of any kind).
    • JumpCrisscross 42 minutes ago
      > block archive.org and thus no one will read their articles and they can go under?

      …why would they go under if the people who don’t pay for news stop reading them?

  • kr108sdh 24 minutes ago
    The petition should be to ban the AI theft. If it is on wayback, the bots could as well scrape the NYT directly.

    The NYT is of course guilty itself. It did not investigate the possible murder of its star witness Suchir Balaji and is too reserved in examining the consequences of AI in general.

    If they don't fulfill their journalistic and societal obligations, soon its own journalists will be replaced by AI bullet point slop like Axios.

  • xyzzy_plugh 39 minutes ago
    The title freaked me out. I thought this was about the Wayback Machine going away but no, it's just news publications blocking being archived.

    I guess I don't really care. As soon as it becomes unworkable to view these publications through archivers I'll just stop viewing them altogether. I don't see this helping their bottom line though.

    • ameliaquining 20 minutes ago
      As long as other people are reading them, they're important for understanding what's happening in the world and what information the public is getting, which is why we need an accessible archive of their content.
      • redwall_hp 8 minutes ago
        Exactly. Libraries have kept microfiche archives of newspapers for forever, and they're an essential part of historical research.

        They also preserved old books. But now I guess they're becoming middlemen for access to limited ebook platforms that ensure books disappear when publishers lose interest.

        The "Information Age" is proving to be the setup for a dark age, when nonprofitable things are just thrown out and efforts to preserve them are actively fought.

        • layman51 2 minutes ago
          I think part of this is important too because online news articles might have corrections, or certain paragraphs might get deleted in some rare situations. It's good to have a way of tracking those. Sometimes, the edits made to an article are very irrelevant to the actual message. I'm thinking stuff like typos, or even embarrassing gaffes like the recent time that a headline implied that the NATO acronym had the word "American" in it.