1. Rad Cirskis
  2. site-aggregation

Source

site-aggregation /

Filename Size Date modified Message
crawler
disabled
scripts
70 B
2.0 KB
0 B
1.2 KB
0 B
872 B
64 B
246 B

Install

  1. git clone git@bitbucket.org:leonexu/site-aggregation.git
  2. cd site-aggregation
  3. git checkout bukalapak
  4. ./install
  5. login to mysql console >>> mysql
  6. drop the old database if you are going to use the same name >>> DROP DATABASE bukalapak;
  7. create the new database >>> CREATE DATABASE bukalapak;

Configure (in "settings.py" and "private_settings.py")

  1. Set your mysql user, password, database etc. details
  2. Set SHOPS list
  3. Increase CONCURRENT_REQUESTS_PER_IP (or per domain) if you have proxies

Run

  1. change current working directory to project's root directory (where 'scrapy.cfg' file is located)
  2. activate virtual env >>> source venv/bin/activate

get the data of all shops

scrapy crawl myShopSpider

Find keywords from a set of seed keywords and save into DB: RelatedKeywordFinder.GatherPlaceholderKeywordsFromSeedsAndWriteIntoDb()

get data for a part of shops

scrapy scrapy myShopSpider -a service=service servicecan be the following values:

CrawlMyShops: Get data of shops in ./crawler/private_settings.py

CrawlShopsInDb: Get data of shops in DB

AnalyzeKeyword

(1) Get google search info (search search volume, cpc, ...) for keywords in DB. (2) Get the the seller number of these keywords in lazada. Save (1) and (2) into DB.

GetSellerShopsFromKeywords

PriceComparison: Do not work at present

Show results

  1. For product >>> ./product_dump.py https://product_url.com
  2. For shop >>> ./shop_dump.py https://shop_url.com --csv filename --data likes

Possible errors:

when run scrapy crawl bukalapak, get error:

ImportError: No module named loader.processors

The possible reasons are low pip version and low scrapy version Check scrapy version python -c "import scrapy; print scrapy.version" Check pip version pip --version

Solve the problem 1) cd to project's root 2) source venv/bin/activate 3) pip install -U Scrapy 4) pip freeze | grep Scrapy

Support

Eduard D. (2jamb0ss@gmail.com)