Commits

Ed Brannin committed 0d7df0a

Initial attempts at dealing with empty links and uselessly-nested divs.

  • Participants
  • Parent commits c925f45

Comments (0)

Files changed (1)

   for tag in ('font', 'span', 'b', 'i'):
     for s in soup.findAll(tag):
       s.replaceWith(NavigableString(s.renderContents()))
+  for div in soup.findAll('div'):
+    print "New <div>"
+    print type(div)
+    if type(div) == NavigableString:
+      print "NavigableString!"
+    else:
+      print div.prettify()
+      print dir(div)
+    #print div.parent.parent
+    print
+  
+  for a in soup.findAll('a'):
+    if len(a.contents) == 0:
+      print "Empty link to %s" % a['href']
+      a.extract()
+    else: 
+      print "OK: %s" % a.prettify()
+   
   print "Title: " + soup.title.renderContents()
-  print soup.prettify()
+  # print soup.prettify()
 
   raise "Stop!"