Commits

Petar Marić committed fb9288f

Blacklisted a couple of talk URLs as they haven’t had any downloads for quite a while. Besides, I’m bored to tears from receiving daily error reports caused by these URLs every single day for the last 6 months.

Comments (0)

Files changed (1)

metaTED/crawler/get_talks_urls.py

 TOTAL_PAGES_RE = re.compile("Showing page \d+ of (\d+)")
 
 
+TALKS_URLS_BLACKLIST = [
+    'http://www.ted.com/talks/rokia_traore_sings_m_bifo.html', # No downloads
+    'http://www.ted.com/talks/rokia_traore_sings_kounandi.html', # No downloads
+]
+
+
 def _read_page(page_num):
     return urlread(TALKS_LIST_URLS % page_num)
 
     urls = []
     for page in xrange(1, _get_num_pages()+1): # Talk list pages are 1-indexed
         urls.extend(_get_talks_urls_from_page(page))
+    
+    # Remove the well-known problematic talk URLs (i.e. no downloads available)
+    urls = filter(lambda x: x not in TALKS_URLS_BLACKLIST, urls)
+    
     logging.info("Found %d talk url(s) in total", len(urls))
     return urls