1. Peter Dulacka
  2. [FIIT-Project] Anchor text analysis

Wiki

Clone wiki

[FIIT-Project] Anchor text analysis / Home

Anchor text analyzer

This project is created as a work for Information Retrieval course at Faculty of Informatics and Information Technology. The code is far from clean and mostly it's just I needed to use during the data parsing and analyzing.

Info

The project is based on hypothesis, that all anchor texts are of one of these types: title, description or attribute (of smth.). The code is not available for general use, it definitely is not DRY. The current state of the project is (unless someone sends me an email with some request) final and probably will not be improved in the future.

Thanks for understanding.

Features

  • wordpress post parser
  • anchor text extractor
  • stanford part-of-speech tagger analyzer

All in one it's a tool which can crawl some wordpress blog, save all the posts, extract anchor texts and links and analyze them one by one. The outcome is the type of anchor text.

For more information, please visit my Information Retrieval wiki page.

Enjoy.
Peter Dulacka

Updated