-
assigned issue to
- changed component to scraper
- changed version to 2.0.43
- edited description
Why is the DublinCore Scraper not working for SCIRP?
Issue #1895
resolved
Try
http://www.scirp.org/journal/PaperInformation.aspx?PaperID=37807
Although the web page contains Dublin Core Metadata, the scraper is not working. Which fields are missing or are not extracted?
Comments (6)
-
-
- changed status to open
-
Account Deleted The problem was in DublinCoreToBibtexConverter class, the regular expression representation of DC, in which it only handles when it is capital letters only.
Pattern.compile("(?im)<\\s*meta(?=[^>]*lang=\"([^\"]*)\")?(?=[^>]*content=\"([^\"]*)\")[^>]*name=\"(?-i)DC(?i).([^\"]*)\"[^>]*>");
It is modified into
Pattern.compile("(?im)<\\s*meta(?=[^>]*lang=\"([^\"]*)\")?(?=[^>]*content=\"([^\"]*)\")[^>]*name=\"(?-i)[D|d][C|c](?i).([^\"]*)\"[^>]*>");
-
reporter Suggestion: simplify and use
Pattern.compile("(?im)<\\s*meta(?=[^>]*lang=\"([^\"]*)\")?(?=[^>]*content=\"([^\"]*)\")[^>]*name=\"(DC|dc).([^\"]*)\"[^>]*>");
-
Account Deleted I modified it a bit because the above expression did not work.
"(?im)<\\s*meta(?=[^>]*lang=\"([^\"]*)\")?(?=[^>]*content=\"([^\"]*)\")[^>]*name=\"[D|d][C|c].([^\"]*)\"[^>]*>"
-
Account Deleted - changed status to resolved
It is resolved.
- Log in to comment