improve error message for DOIs that we could not scrape metadata for

Issue #1910 resolved
Robert Jäschke created an issue

For DOIs where we could not get metadata, we currently produce lengthy error messages that contain the whole HTML source code of the page - e.g., enter the DOI 10.1688/1862-0000_ZfP_2013_03_Arp on http://www.bibsonomy.org/postPublication?selTab=3 and see what happens ...

A better error message would be "Could not find metadata in page http://www.hampp-verlag.de/hampp_e-journals_ZfP.htm#313 referenced by the DOI 10.1688/1862-0000_ZfP_2013_03_Arp."

Please check, how this can be accomplished and suggest a solution here.

Comments (10)

  1. Robert Jäschke reporter

    You are right, the example DOI I gave you is in the meantime supported by BibSonomy. Try this one: 10.3866/PKU.WHXB201112303 and you will experience the problem I described.

  2. Former user Account Deleted

    Thanks. Scraping and its Error handling is implemented in the AbstractEditPublicationController. We could just stop passing the whole scrapedBibtex to Springs Error handling and or display another error message as you described.

  3. Robert Jäschke reporter

    I guess the general idea is good, I just don't understand how you want to stop the process? How do you want to know, that the above error happened?

  4. Former user Account Deleted

    Well, if we want more detailled error handling then someone needs to enhance that of SimpleBibTexParser, cause at the moment the only Exception it throws is ParseException, which contains "our" error, but probably even more than that.

    In my oppinion simply not showing the whole scraped Document and instead at least showing the url would be enough.

  5. Robert Jäschke reporter

    Well, if the document "is" BibTeX, then it is important to show it, in particular, if the user has uploaded it. Otherwise, it is not possible to identify typing errors, etc.

    I would say we should try to prevent the error earlier - in the scraper module. The scraper which currently returns this HTML could be changed to not return it. Can you please find out, which scraper returns the HTML and why it does not return nothing?

  6. Former user Account Deleted

    For 10.3866/PKU.WHXB201112303, first the DOIScraper redirects the url. Then its processed by the ContentNegotiationDOIScraper, which sends a "Content Negotiation"-Request to dx.doi.org to receive the bibtex. At this point he should retrieve a BibTex, but he gets the HTML-Document and returns it without further checking.

    I'm not familiar with the scraper module. I think we have to implement some kind of Validation Check for the documents retrieved by the ContentNegotiationDOIScraper.

  7. Log in to comment