1. Jason Baldridge
  2. fogbow

Commits

Jason Baldridge  committed 2a3e56e Merge

Merged.

  • Participants
  • Parent commits c5a0ee5, 5d99297
  • Branches default

Comments (0)

Files changed (3)

File README

View file
  • Ignore whitespace
 Introduction
 ============
 
-This package provides example code for instruction for Hadoop. It is
-called "fogbow" because of the prevalent use of meteorological terms
-in cloud computing packages. (The word "fogbow" itself means a rainbow
-formed from fog rather than clouds.)
+This package provides example code for instruction for Hadoop. It
+provides a build structure that ensures that all the packages
+necessary for building basic Hadoop applications are available for
+compilation, and further, that they are available for running
+applications using a pre-configured classpath or bottleds-up assembly
+jar that contains Fogbow and all its dependencies. 
+
+The toolkit is called Fogbow because of the prevalent use of
+meteorological terms in cloud computing packages. (The word "fogbow"
+itself means a rainbow formed from fog rather than clouds.)
+
+There are just two classes in Fogbow.
 
  * fogbow.example.WordCount - word count in Java (from the
    standard Hadoop distribution)
 * Version 1.6 of the Java 2 SDK (http://java.sun.com)
 * Version 0.20.2 of Hadoop: http://hadoop.apache.org/common/releases.html
 
+
 Configuring your environment variables
 ======================================
 
 Next, add the directory FOGBOW_DIR/bin to your path. For example, you
 can set the path in your .bashrc file as follows:
 
-export PATH="$PATH:$FOGBOW_DIR/bin"
+export PATH=$PATH:$FOGBOW_DIR/bin
 
 Once you have taken care of these three things, you should be able to
 build and use the Fogbow Library.
 the value when on Unix, but you must *NOT* do this when under Windows.
 
 It is assumed that you have Hadoop 0.20.2 installed and in your path,
-and that you have set HADOOP_DIR to be the location of your Hadoop
+and that you have set HADOOP_HOME to be the location of your Hadoop
 0.20.2 installation.
 
 
 ===============================
 
 Fogbow uses SBT (Simple Build Tool) with a standard directory
-structure.  To build Fogbow, type:
+structure.  To build Fogbow, type (in the $FOGBOW_DIR directory):
 
 $ fogbow build update compile
 
 https://github.com/harrah/xsbt/wiki
 
 Note: if you have SBT 0.10.1 already installed on your system, you can
-also just call it directly with "sbt".
+also just call it directly with "sbt" in FOGBOW_DIR.
 
 
 Trying it out
 
 $ hadoop jar $FOGBOW_DIR/target/fogbow-assembly.jar fogbow.example.WordCountScala pg1661.txt wc_out_holmes_scala_assembly
 
+Note: If you have set up HDFS and have put pg1661.txt onto it (e.g.,
+using "hadoop fs -put pg1661.txt pg1661.txt"), then this *will* run in
+distributed mode.
+
+
+Try out Cloud9
+==============
+
+Fogbow includes Cloud9, a Hadoop package created by Jimmy Lin for
+teaching MapReduce at the University of Maryland. Try out the Cloud9
+word count as follows.
+
+Get the Cloud9 file that has the Bible and Shakespeare bundled
+together:
+
+$ wget --no-check-certificate https://github.com/lintool/Cloud9/raw/603977334b5e25ecf23a182a77fda136fe1df5ff/data/bible+shakes.nopunc.gz
+
+Unzip the file:
+
+$ gunzip bible+shakes.nopunc.gz
+
+Run Cloud9 word count:
+
+$ fogbow run edu.umd.cloud9.example.simple.DemoWordCount bible+shakes.nopunc wc 1
+
+This says to count the words in the file bible+shakes.nopunc,
+outputting the results to the directory "wc", and using one reducer.
+
+Check that you obtained the desired output:
+
+$ grep othello wc/part-r-00000
+othello	339
+othello's	11
+
+
 
 Now what?
 =============
 build to your own project and start creating your own packages based
 on these fairly straightforwardly. You'll want to:
 
- * change the information in $FOGBOW_DIR/project/build.properties to
-   reflect your own project details
+ * Change $FOGBOW_DIR/build.sbt properties and configurations to be
+   appropriate for your project. If you need to specify new managed
+   dependencies, you can do so easily in that file (see SBT
+   documentation for details). If you prefer to add dependencies
+   manually, just add them to $FOGBOW_DIR/lib and they'll get picked
+   up without any fuss.
 
- * change $FOGBOW_DIR/build.sbt configurations to be appropriate for
-   your project, and if you need to specify new managed dependencies,
-   you can do so easily in that file (see SBT documentation for
-   details). If you prefer to add dependencies manually, just add them
-   to $FOGBOW_DIR/lib and they'll get picked up without any fuss.
-
- * change $FOGBOW_DIR/bin to be an executable of your choice, named
+ * Change $FOGBOW_DIR/bin to be an executable of your choice, named
    for your project, and adapt as necessary (including changing
    $FOGBOW to your project name, etc).
 
 =========================
 
 Email Jason Baldridge: jasonbaldrige@gmail.com
+
+Or, create an issue on Bitbucket: 
+
+    https://bitbucket.org/jasonbaldridge/fogbow/issues
+

File bin/fogbow

View file
  • Ignore whitespace
 #!/bin/bash
 
-if [ -z $HADOOP_DIR ]
+JARS=`echo $FOGBOW_DIR/lib/*.jar $FOGBOW_DIR/target/*.jar $HADOOP_HOME/*.jar $HADOOP_HOME/lib/*.jar | tr ' ' ':'`
+
+JARS_MANAGED=
+if [ -e $FOGBOW_DIR/lib_managed ]
 then
-    HADOOP_DIR=$HADOOP_HOME
+    JARS_MANAGED=`find $FOGBOW_DIR/lib_managed -name '*.jar' -print | tr '\n' ':'`
 fi
 
-JARS=`echo $FOGBOW_DIR/lib/*.jar $FOGBOW_DIR/target/*.jar $HADOOP_DIR/*.jar $HADOOP_DIR/lib/*.jar | tr ' ' ':'`
-JARS_MANAGED=`find $FOGBOW_DIR/lib_managed -name '*.jar' -print | tr '\n' ':'`
-
 SCALA_LIB="$FOGBOW_DIR/project/boot/scala-2.9.0/lib/scala-library.jar"
 
 CP=$FOGBOW_DIR/target/classes:$JARS:$JARS_MANAGED:$SCALA_LIB:$CLASSPATH
 help()
 {
 cat <<EOF
-Fogbow 0.1 commands: 
+Fogbow 0.1.2 commands: 
 
   build         build Fogbow with SBT
   wordcount     do the standard word count example

File build.sbt

View file
  • Ignore whitespace
 name := "Fogbow"
 
-version := "0.1"
+version := "0.1.2"
 
 organization := "The University of Texas at Austin"