Scrapping BibTex from Google Scholar is broken via the Chrome Plugin

Issue #2544 closed

Martin Becker created an issue 2015-10-25

When I try to add a publication via the BibText snippet provided by Google Scholar using the Chrome Plugin, I get the following error:

Could not scrape the URL https://scholar.google.de/scholar.bib?q=info:FrNsHbA-fzsJ:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0. Message was: java.lang.StringIndexOutOfBoundsException: String index out of range: -1

Comments (18)

Daniel Zoller
Could you please report the publication you tried to scrape. I'm getting a 403 error using the provided link.
- 2015-10-25T20:12:50+00:00
Martin Becker reporter
CitiSense: improving geospatial environmental assessment of air quality using a wireless personal exposure monitoring system

The link reported above still works for me, but I have noticed that it does not when I use the private mode of my browser. So it seems to be a permission permission problem.
- 2015-10-26T07:03:30+00:00
Robert Jäschke
- assigned issue to
  
  Mohammed Abed
- 2015-10-26T14:40:11+00:00
Daniel Zoller
- changed status to open
- 2015-10-27T00:31:47+00:00
Mohammed Abed
i have this error when i open the link on the Chrome, InternetExplorer, Safari

Your client does not have permission to get URL /scholar.bib?q=info:FrNsHbA-fzsJ:scholar.google.com/&output=citation&hl=de&ct=citation&cd=0 from this server.
- 2015-11-21T10:13:48+00:00
Robert Jäschke
It's not clear, which page should be scrapable. An option would be to scrape the first hit on a search page. In this case this would be the desired article.
- 2016-02-09T15:25:30+00:00
Martin Becker reporter
I was referring to the BibTex page (when clicking on the "Import into BibTeX" button). In the mentioned case: https://scholar.google.de/scholar.bib?q=info:FrNsHbA-fzsJ:scholar.google.com/&output=citation&hl=en&ct=citation&cd=0 Which seems not to be accessible without session information, I guess. Did Google change that which caused the the Plugin to break?
- 2016-02-10T08:18:06+00:00
Robert Jäschke
When I access your URL, I get an error. When I do the following, BibSonomy can successfully scrape the BibTeX:
1. Open this URL (the search from my previous message)
2. Click on Cite below the first (and only) search result.
3. In the overlay, click on BibTeX.
4. The resulting page can be scraped by BibSonomy.
Tested on Firefox with BibSonomy's "postPublication" bookmarklet (not the plugin, but that should also work).

(It's even the GoogleScholarScraper which is extracting the data, though in that case the BibTeXScraper would do the job as well.)
- 2016-02-10T08:25:40+00:00
Martin Becker reporter
Indeed, if I do what you said, it will work.

Regarding my way of doing this (which saves 2 clicks): I know that the URL will get an error for you, because it seems to be session-bound for some reason (thus the scrapper will probably not be able to parse the URL the normal way). To reproduce, please try changing your Google Scholar settings to only show BibTex (Settings->Show links to import citations into "BibTex"). Then use the appropriate link ("Import into BibTeX") which is shown instead of "cite" and try to import that BibTex snippet.
- 2016-02-10T09:12:32+00:00
Mohammed Abed
i implemented it just now, after 3 attempts, google has blocked me :D :D hahaha I'm getting a 403 error using the provided link.
- 2016-04-29T10:33:54+00:00
Mohammed Abed
- changed status to resolved
fixes ~~#2544~~

→ <<cset d4a9bafe2d3f>>
- 2016-04-29T11:11:12+00:00
Mohammed Abed
- changed status to closed
- 2016-04-29T13:50:55+00:00
Daniel Zoller
- changed status to open
does not work test returns 403

any other solution?
- 2016-05-01T17:19:25+00:00
Daniel Zoller
- changed milestone to 3.6.0
- 2016-05-01T17:19:35+00:00
Mohammed Abed
if i scrape information from this site i need a parameter csisig and this parameter ist made by javascript from the website. Robert Jäschke and i have decided: we will not scrape from this site WHEN we have multiple Articles in one Page...
- 2016-05-01T17:49:10+00:00
Daniel Zoller
- changed title to Scrapping BibTex from Google Scholar is broken via the Chrome Plugin
- 2016-06-09T09:31:35+00:00
Mohammed Abed
- changed status to resolved
fixes ~~#2544~~ closing ~~#2544~~ clean the Scraper

→ <<cset 57cc00520aa0>>
- 2016-06-13T13:29:26+00:00
Mohammed Abed
- changed status to closed
fixes ~~#2544~~ closing ~~#2544~~ clean the Scraper

→ <<cset 57cc00520aa0>>
- 2016-06-13T13:29:26+00:00
Log in to comment

Assignee: Mohammed Abed

Type: bug

Priority: major

Status: closed

Component: scraper

Milestone: 3.6.0

Version: 3.3

Votes: 0

Watchers: 2