Y'all Wanna Scrape with Us? Content Ain't a Thing : Web Scraping With Our Favorite Python Libraries
I got 99 problems but content ain't one
- Everyone needs good content.
- Good content exists all over the web.
- Scrape it 'til you make it.
LXML: Diving in
lxml.etree vs. lxml.html
- etree: best for properly formatted xml/xhtml
- etree: powerful and fast for SOAP or other xml-formatted content
- html: best for web sites & irregular content