archive bookmarked URLs

Issue #1986 open
Robert Jäschke created an issue

utilizing services such as archive.today and the Internet Archive. It really is not more than submitting a URI to a web form and retrieving the HTTP response header/body for the URI of the archived resource. Note that archive.today does not necessarily honor robots.txt (the IA does) so one suggestion would be to check for robots.txt before pro-actively archiving a resource.

Comments (4)

  1. Robert Jäschke reporter

    archive.today builds a queue per IP address to throttle requests and we speculate that the IA has something similar in place. To us, this is sufficient to address the concern about spammers. (One could even make the case that a pro-actively created Memento can help identify spam.) We are using archive.today for HTML pages (they are better than the IA in making snapshots) and the IA for all other content types (mainly b/c archive.today does not do PDFs, for example). To give you an idea how simple this is, I created a Memento in for:

    http://www.kbs.uni-hannover.de/~jaeschke/

    in archive.today by sending the POST request:

    curl -i -d url="http://www.kbs.uni-hannover.de/~jaeschke/" http://archive.today/submit/

    You can find the Memento URI:

    http://archive.today/ALnI6

    in the response header as well as in the response body. https://archive.today/http://www.kbs.uni-hannover.de/~jaeschke/ shows it is the only Memento available in that archive thus far.

    in the IA by sending the HEAD request:

    curl -I http://web.archive.org/save/http://www.kbs.uni-hannover.de/~jaeschke/

    You can find the Memento URI:

    /web/20140716173305/http://www.kbs.uni-hannover.de/~jaeschke/

    in the HTTP response headers (in this case its a relative URI). http://web.archive.org/web/*/http://www.kbs.uni-hannover.de/~jaeschke/ shows that its the only Memento for July 2014.

    This approach would have the significant advantage of being able to convey the URI of the specific Memento, which is clearly better than the best effort approach, with the data-versionURL attribute.

  2. Log in to comment