Scrapping BibTex from Google Scholar is broken via the Chrome Plugin

Issue #2544 closed
Martin Becker created an issue

When I try to add a publication via the BibText snippet provided by Google Scholar using the Chrome Plugin, I get the following error:

Could not scrape the URL https://scholar.google.de/scholar.bib?q=info:FrNsHbA-fzsJ:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0. Message was: java.lang.StringIndexOutOfBoundsException: String index out of range: -1

Comments (18)

  1. Daniel Zoller

    Could you please report the publication you tried to scrape. I'm getting a 403 error using the provided link.

  2. Martin Becker reporter

    CitiSense: improving geospatial environmental assessment of air quality using a wireless personal exposure monitoring system

    The link reported above still works for me, but I have noticed that it does not when I use the private mode of my browser. So it seems to be a permission permission problem.

  3. Mohammed Abed

    i have this error when i open the link on the Chrome, InternetExplorer, Safari

    Your client does not have permission to get URL /scholar.bib?q=info:FrNsHbA-fzsJ:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0 from this server.

  4. Robert Jäschke

    It's not clear, which page should be scrapable. An option would be to scrape the first hit on a search page. In this case this would be the desired article.

  5. Robert Jäschke

    When I access your URL, I get an error. When I do the following, BibSonomy can successfully scrape the BibTeX:

    1. Open this URL (the search from my previous message)
    2. Click on Cite below the first (and only) search result.
    3. In the overlay, click on BibTeX.
    4. The resulting page can be scraped by BibSonomy.

    Tested on Firefox with BibSonomy's "postPublication" bookmarklet (not the plugin, but that should also work).

    (It's even the GoogleScholarScraper which is extracting the data, though in that case the BibTeXScraper would do the job as well.)

  6. Martin Becker reporter

    Indeed, if I do what you said, it will work.

    Regarding my way of doing this (which saves 2 clicks): I know that the URL will get an error for you, because it seems to be session-bound for some reason (thus the scrapper will probably not be able to parse the URL the normal way). To reproduce, please try changing your Google Scholar settings to only show BibTex (Settings->Show links to import citations into "BibTex"). Then use the appropriate link ("Import into BibTeX") which is shown instead of "cite" and try to import that BibTex snippet.

  7. Mohammed Abed

    i implemented it just now, after 3 attempts, google has blocked me :D :D hahaha I'm getting a 403 error using the provided link.

  8. Mohammed Abed

    if i scrape information from this site i need a parameter csisig and this parameter ist made by javascript from the website. Robert Jäschke and i have decided: we will not scrape from this site WHEN we have multiple Articles in one Page...

  9. Log in to comment