ScienceMag sometimes does not work
We got an error report from a user for the URL http://www.sciencemag.org/content/302/5651/1704.full that got the following error: Could not scrape the URL: Download link is not available
I tested the URL and for the first time it worked but then I got the same error.
Can you please find out, what is happening and then repair the scraper? Thanks!
Comments (14)
-
Account Deleted -
reporter Why is it not possible to post from the URL with ".full" at the end? Would it be possible to support these URLs? Which changes would be neccessary in the scraper?
-
reporter What is the status with this task? The URL http://www.sciencemag.org/content/302/5651/1704.full ist still not working.
-
Account Deleted ScienceMagScraper is working for http://www.sciencemag.org/content/302/5651/1704.full
-
reporter I just tried the URL on the test system and got the error:
Could not scrape the URL http://www.sciencemag.org/content/302/5651/1704.full. Message was: Download link is not available
Can you please check that these URLs are working and add a JUnit test that ensures that they work?
-
- changed status to open
-
reporter It seems that access to the BibTeX is not allowed on our servers. If I download the URL http://www.sciencemag.org/content/302/5651/1704.full on my laptop, then the source code contains the string "Download Citation" which is not contained when I download the URL on one of the servers.
-
reporter I found the solution: http://www.sciencemag.org/content/302/5651/1704.short works but http://www.sciencemag.org/content/302/5651/1704.full does not. Notice the difference at the end of the URL: short vs. full.
Can you please adopt the URL pattern accordingly? I think we should explicitly match on
\\.(short|full)
at the end and then extract the ID and query the URL with.short
and not.full
. -
reporter @misgna what is the status of this issue?
-
Account Deleted @jaeschke ScienceMagScraper uses generic CitationManagerScraper to scrape from Science Magazine. CitationManagerScraper extracts a link and construct a new url by adding ""&type=bibtex"" at the end. It doe not have any problem if a url ends with .full , .short or .abstract. I wrote a simple java class to test the two urls
public class SMMain { public static void main(String[] args) throws ScrapingException, MalformedURLException{ ScienceMagScraper sms = new ScienceMagScraper(); ScrapingContext sc = new ScrapingContext(new URL("http://www.sciencemag.org/content/302/5651/1704.full")); System.out.println(sms.scrape(sc)); System.out.println(sc.getBibtexResult()); } }
Finally, I got the same BibTeX result. I did not change anything. It works as it was. Plus I run the JUnit Test and no error report shown.
-
reporter - changed status to resolved
fixes
#1827→ <<cset df57981cebbc>>
-
reporter @misgna Please test it again on BibSonomy - you will see, that it does not work with the .full URL but it works with the .short URL. The reason is, that the BibSonomy server is running at University of Kassel which has different access rights to ScienceMag. Hence, it is difficult to test this behavior here (use the development system for testing).
I have implemented a clean fix. Please have a look at it and try to understand what it does.
-
- edited description
- changed version to 2.0.46
-
- changed component to scraper
- Log in to comment
It is possible to post to bibsonomy from the following link http://www.sciencemag.org/content/302/5651/1704 http://www.sciencemag.org/content/302/5651/1704.abstract but not from http://www.sciencemag.org/content/302/5651/1704.full. The thing is CitationMagScraper constructs the same url for three of the above urls in which the bibtex is available.