1. Jernej Kos
  2. imdbpy

Commits

Davide Alberani  committed e0de76c

fix for beautifulsoup

  • Participants
  • Parent commits 1bcddfc
  • Branches default

Comments (0)

Files changed (3)

File docs/Changelog.txt

View file
   Changelog for IMDbPY
   ====================
 
-* What's the new in release 4.8dev20110928 "Rise of the Planet of the Apes" (28 Sept 2011)
+* What's the new in release 4.8dev20111030 "The Rite" (30 Oct 2011)
   [general]
   - fix for a problem managing exceptions with Python 2.4.
   - converted old-style exceptions to instances.
     reviews" and "dvd".
   - fix for cast of tv series.
   - fix for title of tv series.
+  - now the beautiful parses work again.
 
   [httpThin]
   - removed "httpThin", falling back to "http".

File imdb/__init__.py

View file
 
 __all__ = ['IMDb', 'IMDbError', 'Movie', 'Person', 'Character', 'Company',
             'available_access_systems']
-__version__ = VERSION = '4.8dev20110928'
+__version__ = VERSION = '4.8dev20111030'
 
 # Import compatibility module (importing it is enough).
 import _compat

File imdb/parser/http/utils.py

View file
         # Temporary fix: self.parse_dom must work even for empty strings.
         html_string = self.preprocess_string(html_string)
         html_string = html_string.strip()
-        # tag attributes like title=""Family Guy"" will be
-        # converted to title=""Family Guy"" and this confuses BeautifulSoup.
         if self.usingModule == 'beautifulsoup':
+            # tag attributes like title=""Family Guy"" will be
+            # converted to title=""Family Guy"" and this confuses BeautifulSoup.
             html_string = html_string.replace('""', '"')
+            # Browser-specific escapes create problems to BeautifulSoup.
+            html_string = html_string.replace('<!--[if IE]>', '"')
+            html_string = html_string.replace('<![endif]-->', '"')
         #print html_string.encode('utf8')
         if html_string:
             dom = self.get_dom(html_string)