Commits

Jason Baldridge  committed 86fcf40

small.

  • Participants
  • Parent commits 3f93105

Comments (0)

Files changed (2)

+# use glob syntax.
+syntax: glob
+
+output
+netbeans
+*.gz
+target
+tmp
+*~
+project/boot
+lib_managed
+#*
+.#*

File README~

-----------------------------------------------------------
-The TACC Hadoop Example toolkit
-
-Author: Jason Baldridge (jasonbaldridge@gmail.com)
-----------------------------------------------------------
-
-
-Introduction
-============
-
-This package provides example code to support running Hadoop jobs on
-the Longhorn compute cluster at the Texas Advanced Computing
-Center (but it could be useful for others). Only two classes are defined:
-
- * tacc.hadoop.example.WordCount - word count in Java (from the
-   standard Hadoop distribution)
-
- * tacc.hadoop.scala.WordCount - word count in Scala (adapted from the
-   Java)
-
-This file contains the configuration and build instructions. 
-
-Requirements
-============
-
-* Version 1.6 of the Java 2 SDK (http://java.sun.com)
-* Version 0.20.2 of Hadoop: http://hadoop.apache.org/common/releases.html
-
-Configuring your environment variables
-======================================
-
-The easiest thing to do is to set the environment variables JAVA_HOME
-and TACCDOOP_DIR to the relevant locations on your system. Set JAVA_HOME
-to match the top level directory containing the Java installation you
-want to use.
-
-For example, on Windows:
-
-C:\> set JAVA_HOME=C:\Program Files\jdk1.5.0_04
-
-or on Unix:
-
-% setenv JAVA_HOME /usr/local/java
-  (csh)
-> export JAVA_HOME=/usr/java
-  (ksh, bash)
-
-On Windows, to get these settings to persist, it's actually easiest to
-set your environment variables through the System Properties from the
-Control Panel. For example, under WinXP, go to Control Panel, click on
-System Properties, choose the Advanced tab, click on Environment
-Variables, and add your settings in the User variables area.
-
-Next, likewise set TACCDOOP_DIR to be the top level directory where you
-unzipped the download. In Unix, type 'pwd' in the directory where
-this file is and use the path given to you by the shell as
-TACCDOOP_DIR.  You can set this in the same manner as for JAVA_HOME
-above.
-
-Next, add the directory TACCDOOP_DIR/bin to your path. For example, you
-can set the path in your .bashrc file as follows:
-
-export PATH="$PATH:$TACCDOOP_DIR/bin"
-
-Once you have taken care of these three things, you should be able to
-build and use the Taccdoop Library.
-
-Note: Spaces are allowed in JAVA_HOME but not in TACCDOOP_DIR.  To set
-an environment variable with spaces in it, you need to put quotes around
-the value when on Unix, but you must *NOT* do this when under Windows.
-
-It is assumed that you have Hadoop installed and in your path.
-
-
-Building the system from source
-===============================
-
-Taccdoop uses SBT (Simple Build Tool) with a standard directory
-structure.  To build Taccdoop, type:
-
-$ taccdoop build update compile
-
-This will compile the source files and put them in
-./target/classes. If this is your first time running it, you will see
-messages about Scala being dowloaded -- this is fine and
-expected. Once that is over, the Taccdoop code will be compiled.
-
-To try out other build targets, do:
-
-$ taccdoop build
-
-This will drop you into the SBT interface.  The build targets that are
-supported are listeded here:
-
-http://code.google.com/p/simple-build-tool/wiki/RunningSbt
-
-You can also see targets by typing "actions" on the SBT prompt.
-
-Note: if you have SBT already installed on your system, you can also
-just call it directly with "sbt".
-
-
-Trying it out
-=============
-
-Build the assembly jar (which contains all the dependencies).
-
-$ taccdoop build assembly
-
-This will give you a jar in the target directory named
-Taccdoop-assembly-0.1.jar which you can use with "hadoop jar"
-commands. 
-
-You can try it out on a single machine in non-distributed mode. As an
-example, let's do word count on the Adventures of Sherlock Holmes.
-
-Obtain the text:
-
-$ wget http://www.gutenberg.org/cache/epub/1661/pg1661.txt
-
-To do Java word count, run:
-
-$ hadoop jar $TACCDOOP_DIR/target/Taccdoop-assembly-0.1.jar tacc.hadoop.example.WordCount pg1661.txt wc_out_holmes_java
-
-To do Scala word count, run:
-
-$ hadoop jar $TACCDOOP_DIR/target/Taccdoop-assembly-0.1.jar tacc.hadoop.scala.WordCount pg1661.txt wc_out_holmes_scala
-
-
-Now what?
-=============
-
-The purpose of this package is to allow people to easily build a jar
-of their own without needing anything other than the command line, a
-Hadoop installation, and Java. You should be able to adapt the SBT
-build to your own project and start creating your own packages based
-on these fairly straightforwardly. You'll want to:
-
- * change the information in $TACCDOOP_DIR/project/build.properties to
-   reflect your own project details
-
- * change $TACCDOOP_DIR/project/build/TaccdoopProject.scala to be
-   named for your project, and if you need to specify new managed
-   dependencies, you can do so easily in that file (see SBT
-   documentation for details). If you prefer to add dependencies
-   manually, just add them to $TACCDOOP_DIR/lib and they'll get picked
-   up.
-
- * change $TACCDOOP_DIR/bin to be an executable of your choice, named
-   for your project, and adapt as necessary (including changing
-   $TACCDOOP to your project name, etc).
-
-Good luck!
-
-
-Questions or suggestions?
-=========================
-
-Email Jason Baldridge: jasonbaldrige@gmail.com