Commits

Jason Baldridge  committed 3f93105

Initial commit.

  • Participants

Comments (0)

Files changed (11)

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+
+----------------------------------------------------------
+The Fogbow toolkit
+
+Author: Jason Baldridge (jasonbaldridge@gmail.com)
+	Matt Lease (matt.lease@gmail.com)
+----------------------------------------------------------
+
+
+Introduction
+============
+
+This package provides example code for instruction for Hadoop.
+
+ * fogbow.example.WordCount - word count in Java (from the
+   standard Hadoop distribution)
+
+ * fogbow.scala.WordCount - word count in Scala (adapted from the
+   Java)
+
+This file contains the configuration and build instructions. 
+
+Requirements
+============
+
+* Version 1.6 of the Java 2 SDK (http://java.sun.com)
+* Version 0.20.2 of Hadoop: http://hadoop.apache.org/common/releases.html
+
+Configuring your environment variables
+======================================
+
+The easiest thing to do is to set the environment variables JAVA_HOME
+and FOGBOW_DIR to the relevant locations on your system. Set JAVA_HOME
+to match the top level directory containing the Java installation you
+want to use.
+
+For example, on Windows:
+
+C:\> set JAVA_HOME=C:\Program Files\jdk1.5.0_04
+
+or on Unix:
+
+% setenv JAVA_HOME /usr/local/java
+  (csh)
+> export JAVA_HOME=/usr/java
+  (ksh, bash)
+
+On Windows, to get these settings to persist, it's actually easiest to
+set your environment variables through the System Properties from the
+Control Panel. For example, under WinXP, go to Control Panel, click on
+System Properties, choose the Advanced tab, click on Environment
+Variables, and add your settings in the User variables area.
+
+Next, likewise set FOGBOW_DIR to be the top level directory where you
+unzipped the download. In Unix, type 'pwd' in the directory where
+this file is and use the path given to you by the shell as
+FOGBOW_DIR.  You can set this in the same manner as for JAVA_HOME
+above.
+
+Next, add the directory FOGBOW_DIR/bin to your path. For example, you
+can set the path in your .bashrc file as follows:
+
+export PATH="$PATH:$FOGBOW_DIR/bin"
+
+Once you have taken care of these three things, you should be able to
+build and use the Fogbow Library.
+
+Note: Spaces are allowed in JAVA_HOME but not in FOGBOW_DIR.  To set
+an environment variable with spaces in it, you need to put quotes around
+the value when on Unix, but you must *NOT* do this when under Windows.
+
+It is assumed that you have Hadoop installed and in your path.
+
+
+Building the system from source
+===============================
+
+Fogbow uses SBT (Simple Build Tool) with a standard directory
+structure.  To build Fogbow, type:
+
+$ fogbow build update compile
+
+This will compile the source files and put them in
+./target/classes. If this is your first time running it, you will see
+messages about Scala being dowloaded -- this is fine and
+expected. Once that is over, the Fogbow code will be compiled.
+
+To try out other build targets, do:
+
+$ fogbow build
+
+This will drop you into the SBT interface.  The build targets that are
+supported are listeded here:
+
+http://code.google.com/p/simple-build-tool/wiki/RunningSbt
+
+You can also see targets by typing "actions" on the SBT prompt.
+
+Note: if you have SBT already installed on your system, you can also
+just call it directly with "sbt".
+
+
+Trying it out
+=============
+
+Build the assembly jar (which contains all the dependencies).
+
+$ fogbow build assembly
+
+This will give you a jar in the target directory named
+Fogbow-assembly-0.1.jar which you can use with "hadoop jar"
+commands. 
+
+You can try it out on a single machine in non-distributed mode. As an
+example, let's do word count on the Adventures of Sherlock Holmes.
+
+Obtain the text:
+
+$ wget http://www.gutenberg.org/cache/epub/1661/pg1661.txt
+
+To do Java word count, run:
+
+$ hadoop jar $FOGBOW_DIR/target/Fogbow-assembly-0.1.jar tacc.hadoop.example.WordCount pg1661.txt wc_out_holmes_java
+
+To do Scala word count, run:
+
+$ hadoop jar $FOGBOW_DIR/target/Fogbow-assembly-0.1.jar tacc.hadoop.scala.WordCount pg1661.txt wc_out_holmes_scala
+
+
+Now what?
+=============
+
+The purpose of this package is to allow people to easily build a jar
+of their own without needing anything other than the command line, a
+Hadoop installation, and Java. You should be able to adapt the SBT
+build to your own project and start creating your own packages based
+on these fairly straightforwardly. You'll want to:
+
+ * change the information in $FOGBOW_DIR/project/build.properties to
+   reflect your own project details
+
+ * change $FOGBOW_DIR/project/build/FogbowProject.scala to be
+   named for your project, and if you need to specify new managed
+   dependencies, you can do so easily in that file (see SBT
+   documentation for details). If you prefer to add dependencies
+   manually, just add them to $FOGBOW_DIR/lib and they'll get picked
+   up.
+
+ * change $FOGBOW_DIR/bin to be an executable of your choice, named
+   for your project, and adapt as necessary (including changing
+   $FOGBOW to your project name, etc).
+
+Good luck!
+
+
+Questions or suggestions?
+=========================
+
+Email Jason Baldridge: jasonbaldrige@gmail.com
+----------------------------------------------------------
+The TACC Hadoop Example toolkit
+
+Author: Jason Baldridge (jasonbaldridge@gmail.com)
+----------------------------------------------------------
+
+
+Introduction
+============
+
+This package provides example code to support running Hadoop jobs on
+the Longhorn compute cluster at the Texas Advanced Computing
+Center (but it could be useful for others). Only two classes are defined:
+
+ * tacc.hadoop.example.WordCount - word count in Java (from the
+   standard Hadoop distribution)
+
+ * tacc.hadoop.scala.WordCount - word count in Scala (adapted from the
+   Java)
+
+This file contains the configuration and build instructions. 
+
+Requirements
+============
+
+* Version 1.6 of the Java 2 SDK (http://java.sun.com)
+* Version 0.20.2 of Hadoop: http://hadoop.apache.org/common/releases.html
+
+Configuring your environment variables
+======================================
+
+The easiest thing to do is to set the environment variables JAVA_HOME
+and TACCDOOP_DIR to the relevant locations on your system. Set JAVA_HOME
+to match the top level directory containing the Java installation you
+want to use.
+
+For example, on Windows:
+
+C:\> set JAVA_HOME=C:\Program Files\jdk1.5.0_04
+
+or on Unix:
+
+% setenv JAVA_HOME /usr/local/java
+  (csh)
+> export JAVA_HOME=/usr/java
+  (ksh, bash)
+
+On Windows, to get these settings to persist, it's actually easiest to
+set your environment variables through the System Properties from the
+Control Panel. For example, under WinXP, go to Control Panel, click on
+System Properties, choose the Advanced tab, click on Environment
+Variables, and add your settings in the User variables area.
+
+Next, likewise set TACCDOOP_DIR to be the top level directory where you
+unzipped the download. In Unix, type 'pwd' in the directory where
+this file is and use the path given to you by the shell as
+TACCDOOP_DIR.  You can set this in the same manner as for JAVA_HOME
+above.
+
+Next, add the directory TACCDOOP_DIR/bin to your path. For example, you
+can set the path in your .bashrc file as follows:
+
+export PATH="$PATH:$TACCDOOP_DIR/bin"
+
+Once you have taken care of these three things, you should be able to
+build and use the Taccdoop Library.
+
+Note: Spaces are allowed in JAVA_HOME but not in TACCDOOP_DIR.  To set
+an environment variable with spaces in it, you need to put quotes around
+the value when on Unix, but you must *NOT* do this when under Windows.
+
+It is assumed that you have Hadoop installed and in your path.
+
+
+Building the system from source
+===============================
+
+Taccdoop uses SBT (Simple Build Tool) with a standard directory
+structure.  To build Taccdoop, type:
+
+$ taccdoop build update compile
+
+This will compile the source files and put them in
+./target/classes. If this is your first time running it, you will see
+messages about Scala being dowloaded -- this is fine and
+expected. Once that is over, the Taccdoop code will be compiled.
+
+To try out other build targets, do:
+
+$ taccdoop build
+
+This will drop you into the SBT interface.  The build targets that are
+supported are listeded here:
+
+http://code.google.com/p/simple-build-tool/wiki/RunningSbt
+
+You can also see targets by typing "actions" on the SBT prompt.
+
+Note: if you have SBT already installed on your system, you can also
+just call it directly with "sbt".
+
+
+Trying it out
+=============
+
+Build the assembly jar (which contains all the dependencies).
+
+$ taccdoop build assembly
+
+This will give you a jar in the target directory named
+Taccdoop-assembly-0.1.jar which you can use with "hadoop jar"
+commands. 
+
+You can try it out on a single machine in non-distributed mode. As an
+example, let's do word count on the Adventures of Sherlock Holmes.
+
+Obtain the text:
+
+$ wget http://www.gutenberg.org/cache/epub/1661/pg1661.txt
+
+To do Java word count, run:
+
+$ hadoop jar $TACCDOOP_DIR/target/Taccdoop-assembly-0.1.jar tacc.hadoop.example.WordCount pg1661.txt wc_out_holmes_java
+
+To do Scala word count, run:
+
+$ hadoop jar $TACCDOOP_DIR/target/Taccdoop-assembly-0.1.jar tacc.hadoop.scala.WordCount pg1661.txt wc_out_holmes_scala
+
+
+Now what?
+=============
+
+The purpose of this package is to allow people to easily build a jar
+of their own without needing anything other than the command line, a
+Hadoop installation, and Java. You should be able to adapt the SBT
+build to your own project and start creating your own packages based
+on these fairly straightforwardly. You'll want to:
+
+ * change the information in $TACCDOOP_DIR/project/build.properties to
+   reflect your own project details
+
+ * change $TACCDOOP_DIR/project/build/TaccdoopProject.scala to be
+   named for your project, and if you need to specify new managed
+   dependencies, you can do so easily in that file (see SBT
+   documentation for details). If you prefer to add dependencies
+   manually, just add them to $TACCDOOP_DIR/lib and they'll get picked
+   up.
+
+ * change $TACCDOOP_DIR/bin to be an executable of your choice, named
+   for your project, and adapt as necessary (including changing
+   $TACCDOOP to your project name, etc).
+
+Good luck!
+
+
+Questions or suggestions?
+=========================
+
+Email Jason Baldridge: jasonbaldrige@gmail.com
+#!/bin/bash -u
+
+JARS=`echo $FOGBOW_DIR/lib/*.jar $FOGBOW_DIR/target/*.jar $FOGBOW_DIR/lib_managed/compile/*.jar | tr ' ' ':'`
+SCALA_LIB="$FOGBOW_DIR/project/boot/scala-2.8.1/lib/scala-library.jar"
+
+CP=$FOGBOW_DIR/target/classes:$JARS:$SCALA_LIB:$CLASSPATH
+
+if [ -z $JAVA_MEM_FLAG ] 
+then
+    JAVA_MEM_FLAG=-Xmx2g
+fi
+
+JAVA_COMMAND="java $JAVA_MEM_FLAG -classpath $CP"
+
+CMD=$1
+shift
+
+help()
+{
+cat <<EOF
+Fogbow 0.1 commands: 
+
+  build         build Fogbow with SBT
+  wordcount     do the standard word count example
+  run           run the main method of a given class
+
+Include --help with any option for more information
+EOF
+}
+
+if [ $CMD = 'build' ]; then
+
+    java -jar $FOGBOW_DIR/project/build/sbt-launch-0.7.7.jar "$@"
+
+else 
+
+    CLASS=
+
+    case $CMD in
+	wordcount) CLASS=tacc.hadoop.example.WordCount;;
+	run) CLASS=$1; shift;;
+	help) help; exit 1;;
+	*) echo "Unrecognized command: $CMD"; help; exit 1;;
+    esac
+
+    $JAVA_COMMAND $CLASS $*
+
+fi
+
+

File project/build.properties

+#Project properties
+#Tue Apr 12 09:09:17 CDT 2011
+project.organization=The University of Texas at Austin
+project.name=Fogbow
+sbt.version=0.7.7
+project.version=0.1.1
+build.scala.versions=2.9.0.1
+project.initialize=false

File project/build/FogbowProject.scala

+import sbt._
+
+class FogbowProject (info: ProjectInfo) extends DefaultProject(info) with assembly.AssemblyBuilder {
+  override def disableCrossPaths = true 
+  val hadoop = "org.apache.hadoop" % "hadoop-core" % "0.20.2"
+  val argot = "org.clapper" %% "argot" % "0.2"
+}
+
+

File project/build/sbt-launch-0.7.7.jar

Binary file added.

File project/plugins/Plugins.scala

+class Plugins(info: sbt.ProjectInfo) extends sbt.PluginDefinition(info) {
+  val codaRepo = "Coda Hale's Repository" at "http://repo.codahale.com/"
+  val assemblySBT = "com.codahale" % "assembly-sbt" % "0.1.1"
+}

File project/plugins/project/build.properties

+#Project properties
+#Tue Apr 05 12:54:49 CDT 2011
+plugin.uptodate=true

File src/main/java/fogbow/example/WordCount.java

+/**
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package fogbow.example;
+
+import java.io.IOException;
+import java.util.StringTokenizer;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.Reducer;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+import org.apache.hadoop.util.GenericOptionsParser;
+
+public class WordCount {
+
+  public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{
+    
+    private final static IntWritable one = new IntWritable(1);
+    private Text word = new Text();
+      
+    public void map(Object key, Text value, Context context)
+	throws IOException, InterruptedException {
+
+      StringTokenizer itr = new StringTokenizer(value.toString());
+      while (itr.hasMoreTokens()) {
+        word.set(itr.nextToken());
+        context.write(word, one);
+      }
+    }
+  }
+  
+  public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
+
+    private IntWritable result = new IntWritable();
+
+    public void reduce(Text key, Iterable<IntWritable> values, Context context)
+	throws IOException, InterruptedException {
+
+      int sum = 0;
+      for (IntWritable val : values)
+        sum += val.get();
+
+      result.set(sum);
+      context.write(key, result);
+    }
+  }
+
+  public static void main(String[] args) throws Exception {
+    Configuration conf = new Configuration();
+    Job job = new Job(conf, "word count");
+    job.setJarByClass(WordCount.class);
+    job.setMapperClass(TokenizerMapper.class);
+    job.setCombinerClass(IntSumReducer.class);
+    job.setReducerClass(IntSumReducer.class);
+    job.setOutputKeyClass(Text.class);
+    job.setOutputValueClass(IntWritable.class);
+    FileInputFormat.addInputPath(job, new Path(args[0]));
+    FileOutputFormat.setOutputPath(job, new Path(args[1]));
+    System.exit(job.waitForCompletion(true) ? 0 : 1);
+  }
+}

File src/main/scala/fogbow/scala/WordCount.scala

+/**
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package fogbow.scala;
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.{IntWritable,Text}
+import org.apache.hadoop.mapreduce.{Job,Mapper,Reducer}
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
+import org.apache.hadoop.util.GenericOptionsParser
+import scala.collection.JavaConversions._
+
+class TokenizerMapper extends Mapper[Object, Text, Text, IntWritable] {
+  
+  val one = new IntWritable(1)
+  var word = new Text
+
+  override
+  def map (key: Object, value: Text, context: Mapper[Object,Text,Text,IntWritable]#Context) {
+    value.toString.split("\\s").foreach { token => word.set(token); context.write(word, one) }
+  }
+}
+
+class IntSumReducer extends Reducer[Text,IntWritable,Text,IntWritable] {
+  
+  val result = new IntWritable()
+  
+  override
+  def reduce (key: Text, values: java.lang.Iterable[IntWritable], 
+              context: Reducer[Text,IntWritable,Text,IntWritable]#Context) {
+    result set(values.foldLeft(0) { _ + _.get })
+    context write(key, result)
+  }
+}
+
+object WordCount {
+
+  def main (args: Array[String]) {
+    val conf = new Configuration()
+    val job = new Job(conf, "word count")
+    job.setJarByClass(classOf[TokenizerMapper])
+    job.setMapperClass(classOf[TokenizerMapper])
+    job.setCombinerClass(classOf[IntSumReducer])
+    job.setReducerClass(classOf[IntSumReducer])
+    job.setOutputKeyClass(classOf[Text])
+    job.setOutputValueClass(classOf[IntWritable])
+    FileInputFormat.addInputPath(job, new Path(args(0)))
+    FileOutputFormat.setOutputPath(job, new Path(args(1)))
+    System.exit(if(job.waitForCompletion(true)) 0 else 1)
+  }
+
+}