1. Christos Kannas
  2. DocIRHadoop


DocIRHadoop / How To.rst

How To:


Pyparsing (http://pyparsing.wikispaces.com/):

  • First install setuptools for Python to get easy_install: sudo apt-get install python-setuptools
  • Then install pyparsing: easy_install pyparsing


  • Unzip the source code in a directory of your choice.
  • Make sure that all mapper and reducer scripts located in DocIRHadoop/InvertIndex have permission for execution.
  • Open a terminal.
  • Export the path to the DocIrHadoop directory in PYTHONPATH environment variable: export PYTHONPATH=/path/to/parent/of/DocIRHadoop:$PYTHONPATH
  • Go to parent of DocIRHadoop directory.
  • In a second terminal go to Hadoop directory and start Hadoop (bin/start-all.sh).
  • In the first terminal type: python DocIRHadoop/run.py
  • At the first promt enter the full path of the location of the english-documents directory. Press Enter.
  • Enter the name of the destination directory in HDFS. Press Enter.
  • At this time you will see a lot of information of the execution of DocIRHadoop, especially for the MapReduce jobs.
  • When the jobs for inverted indexing finish you will access the Search section.
  • Here you type your queries, you see the job execution info.
  • And then you get the result of the query.
  • To Exit press Ctrl+D.