Scraping abstracts

Issue #6 resolved
Robert Jäschke created an issue

Most digital libraries contain abstracts (short summaries) of the articles which we extract. Please check for each scraper, if it already extracts the abstract by

  1. looking at the web page, if there is an abstract, and then
  2. scraping the page with the bookmarklet and looking if the abstract was correctly scraped.

Please create a table with the following columns:

  1. Scraper
  2. tested URL
  3. abstract on web page
  4. abstract scraped

The goal then is, to include scraping of abstracts wherever this is possible.

Comments (3)

  1. Robert Jäschke reporter

    Add another column to this table: "5. scraper type" which identifies, how the scraper gets the content:

    • by building a new URL (SimpleGenericUrlScraper)
    • by extracting the BibTeX/EndNote/etc. URL from the content of the given URL
    • otherwise (how?)

    Then, for scrapers which don't get the abstract, yet, but which already download the web page, extract the abstract from the web page and add it to the resulting BibTeX.

  2. Log in to comment