Sci/Tech Internet Archive announces broader crawler scope

tom_mai78101

The Helper Connoisseur / Ex-MineCraft Host
Staff member
Reaction score
1,632
The Internet Archive has made the controversial decision to begin ignoring robots.txt, a directive file used by web servers to keep automated content crawlers away from selected content, in an effort to increase its coverage.

Despite being entirely funded by donations and troubled by the occasional fire, the Internet Archive is making considerable inroads into its self-appointed task to create a publicly accessible archive of everything it can get its hands on. The organisation has, in the last few years, launched in-browser vintage computing emulation, playable classic arcade games, a museum of de-fanged malware, an Amiga software library, a trove of internal documents from interactive fiction pioneer Infocom, to say nothing of its archive of vintage computing magazines.

Its most popular feature, however, is the Wayback Machine, a service which allows users to insert a URL and view any copies the Internet Archive's robots have captured through time. A fantastic resource both for research and for securing information which would otherwise be lost to history, the Wayback Machine has previously respected the robots.txt directives file that allows webmasters to lock automated content crawlers away from chosen files and directories, but now it will do so no longer.


Read more here. (BitTech)
 
General chit-chat
Help Users
  • No one is chatting at the moment.

      The Helper Discord

      Staff online

      Members online

      Affiliates

      Hive Workshop NUON Dome World Editor Tutorials

      Network Sponsors

      Apex Steel Pipe - Buys and sells Steel Pipe.
      Top