HTTPS SSH

E-mails and Links crawler extractor

This script asks for the url of a website, then crawls from page to page making a list of all the links in the website and his sub-pages - but it ignores the link if is external (can be easily deactivated in the code). While crawling the links it also crawls for e-mail address's. In the end it saves 2 txt files: one withh all the links, other with all the emails.

I made this long ago because I needrf all the e-mails of town councils and some other stuff (like restaurants and museums) in the north area of portugal. I found a webpgae with all that but the emails were disperse in a ton of different links. This was the solution, and I got my mailing list.

Dind't tested with more websites, but in theory it works with the majority. Everyone can use this in any way they like, I don't care about licenses or copyright, just have fun. (note: it was made for small scale crawl and only for myself, it only saves the list after getting everything - not ideal for websites with tons of sub-pages, buit can be easily improved)

If you don't have Python installed run the file "main.exe" (compiled it with pyinstaller, but never tested).

To run the source code like a boss the only requirement is: -An installed version of Python (used 2.7, not tested with other versions) -lxml:

pip install lxml

Then just run the file "main.py".

Contacts:

My Homepage: www.paulojorgepm.net