Xiangjun Zhou avatar Xiangjun Zhou committed 0e3a2c4 Draft

update

Comments (0)

Files changed (1)

 
 logging.basicConfig(level=logging.DEBUG)
 
+'''
 urls = ['http://www.163.com', 'http://www.qq.com', 'http://www.sina.com.cn',
         'http://www.sohu.com', 'http://www.yahoo.com', 'http://www.baidu.com',
         'http://www.apple.com', 'http://www.microsoft.com']
+urls = ['http://www.nytimes.com']
+'''
 
+urls = ['http://news.sina.com.cn/society/']
 
 class Crawler:
     def parser(self, req_url, data):
+        print data
         return [len(data)]
 
     def pipeline(self, response):
     def testCrawler(self):
         dt = datetime.now()
         crawler = Crawler()
-        Scheduler(urls, crawler.parser, crawler.pipeline, 8)
+        Scheduler(urls, crawler.parser, crawler.pipeline, max_running=8)
         print datetime.now() - dt
 
-
 unittest.main()
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.