Bugfix/simple Scraper fixes

Merged
#43 · Created  · Last updated

Merged pull request

Merged in bugfix/simple-Scraper-fixes (pull request #43)

73647aa·Author: ·Closed by: ·2022-01-24

Description

  • removing html and fixing citekey from scrapingResult

  • fixed scrapeReferences

  • added new host and fixed scraper

  • modified generic CitMgrScraper to fit more scrapers. childs were also changed overworked reference scraping from jap and aanda(regex didn t stop after references) moved tests and modified them

  • fixed Iucr Scraper

  • fixed faseb Scraper and tests

  • fixed rsoc Scraper and tests

  • fixed rspb Scraper and tests

  • fixed sage Scraper and tests

  • fixed sciencemag Scraper, test and added new host

  • fixed mdpi Scraper and tests

  • fixed nber Scraper and tests

  • removed opac Scrapr and tests added hebis Scraper and tests

  • added opac Scraper again, but removed it from KDEUCScraper fixed springer Scraper

  • spring Scraper really fixed now

  • WebUtils: changed getContentAsString to also accept post Methods added a getHeadersMethod

  • fixed DOIUtils getDOIFromUrl to first decode Url and changed regex for doi fixed SpringerScraper (WorldCatScraper returns false Bibtex) to use DOINegScraper fixed SpringerLinkScraper added and fixed tests for both scrapers

  • fixed apa Scraper and tests

  • fixed nasaads Scraper and tests

  • fixed osti Scraper and tests

  • fixed plos tests and cleaned the plos scraper up

  • fixed ProjectEuclidScraper and tests

  • added hebis and ahajournals scraper to KDEUrlCompositeScraper

  • CitMgrScraper clean up

Many scraper and testdata fixes. Some scraper fixes were more complicated (apa, nasaads, ProjectEuclidScraper), but most were straightforward. Had to adjust DOIUtils and WebUtils, but i tried to have as little side effects as possible. Also moved the testdata for each fixed scraper in its own directory.

 

0 attachments

0 comments

Loading commits...