Commits

Author Commit Message Labels Comments Date
Flávio Coelho
fixed bug in regular expression substitution of starting refence
Flávio Coelho
introduced changes to improve the avoidance of already fetched references
Flávio Coelho
fixed small bug with in pipeline
Flávio Coelho
fixed bug in pipelines.py
Flávio Coelho
Implemented saving reference files in gridfs instead of filesystem;
Flávio Coelho
added specification of database name on settings.py;
Flávio Coelho
added rotation of user agent strings; fixed specification of tor proxy; reimplented duplicate search before saving to db
Flávio Coelho
added support to tor on the fetch bibtex as well
Flávio Coelho
added support for using tor as a proxy. untested.
Flávio Coelho
fixed fetching of bibtex records by setting the appropriate cookie in the request
Flávio Coelho
changed user agent string and added scholar cookies to the request.
Flávio Coelho
nada
Flávio Coelho
added extra files
Flávio Coelho
implemented download de artigos
Flávio Coelho
Gscholar spider tested and working
jayron
adicionei os arquivos de extração de texto
jayron
Modificações das linhas:
jayron
teste
Flavio Codeco Coelho
merge
Flavio Codeco Coelho
added pdf extraction module
Flávio Coelho
added TODO markers in the code
Flávio Coelho
Now the gscholar spider follows the following pages in a given search
Flavio Codeco Coelho
added support to locating pdf url when available.
Flavio Codeco Coelho
adicionei mais arquivos
Flavio Codeco Coelho
adicionei inicio de cliente SOAP para o Pubmed.
Flavio Codeco Coelho
first commit