Commits

Robert Mařík committed 6f4a226

better handling of title

Comments (0)

Files changed (1)

 
         # Find a title (all of them really)
         titles = []
-        #for e in tree.getElementsByTagName('h2'):
-        #    if e.getAttribute('class') == 'titleHead':
-        #        for text in e.childNodes:
-        #            titles.append(text.data)
-        for e in tree.getElementsByTagName('title'):
-            for text in e.childNodes:
-                titles.append(text.data)
+        try:
+            # Grabs the title including diacritics (if any).
+            # Fails, if the title contains complicated structure
+            # (for example from word \LaTeX in title)
+            for e in tree.getElementsByTagName('h2'):
+                if e.getAttribute('class') == 'titleHead':
+                    for text in e.childNodes:
+                        titles.append(text.data)
+        except:
+            pass
+        if not titles:
+            for e in tree.getElementsByTagName('title'):
+                for text in e.childNodes:
+                    titles.append(text.data)
         if not titles:
             titles = ['']