- changed component to webapp
- edited description
improve error message for DOIs that we could not scrape metadata for
For DOIs where we could not get metadata, we currently produce lengthy error messages that contain the whole HTML source code of the page - e.g., enter the DOI 10.1688/1862-0000_ZfP_2013_03_Arp on http://www.bibsonomy.org/postPublication?selTab=3 and see what happens ...
A better error message would be "Could not find metadata in page http://www.hampp-verlag.de/hampp_e-journals_ZfP.htm#313 referenced by the DOI 10.1688/1862-0000_ZfP_2013_03_Arp."
Please check, how this can be accomplished and suggest a solution here.
Comments (10)
-
-
- changed status to open
-
Account Deleted Wasn't that changed a while ago? Cause i get "The URL http://econpapers.repec.org/article/raizfpers/doi_5f10.1688_2f1862-0000_5fzfp_5f2013_5f03_5farp.htm is currently not supported by one of our scrapers." as Error message.
-
reporter You are right, the example DOI I gave you is in the meantime supported by BibSonomy. Try this one:
10.3866/PKU.WHXB201112303
and you will experience the problem I described. -
Account Deleted Thanks. Scraping and its Error handling is implemented in the AbstractEditPublicationController. We could just stop passing the whole scrapedBibtex to Springs Error handling and or display another error message as you described.
-
reporter I guess the general idea is good, I just don't understand how you want to stop the process? How do you want to know, that the above error happened?
-
Account Deleted Well, if we want more detailled error handling then someone needs to enhance that of SimpleBibTexParser, cause at the moment the only Exception it throws is ParseException, which contains "our" error, but probably even more than that.
In my oppinion simply not showing the whole scraped Document and instead at least showing the url would be enough.
-
reporter Well, if the document "is" BibTeX, then it is important to show it, in particular, if the user has uploaded it. Otherwise, it is not possible to identify typing errors, etc.
I would say we should try to prevent the error earlier - in the scraper module. The scraper which currently returns this HTML could be changed to not return it. Can you please find out, which scraper returns the HTML and why it does not return nothing?
-
Account Deleted For 10.3866/PKU.WHXB201112303, first the DOIScraper redirects the url. Then its processed by the ContentNegotiationDOIScraper, which sends a "Content Negotiation"-Request to dx.doi.org to receive the bibtex. At this point he should retrieve a BibTex, but he gets the HTML-Document and returns it without further checking.
I'm not familiar with the scraper module. I think we have to implement some kind of Validation Check for the documents retrieved by the ContentNegotiationDOIScraper.
-
reporter - changed status to resolved
fixes
#1910: after content negotiation we check, if the server returned BibTeX. If not, we throw an exception instead of blindly returning the result.→ <<cset 6b2338ea690d>>
- Log in to comment