1. Miki Tebeka
  2. zipstream

Commits

Miki Tebeka  committed 649b6f5

Docs and license

  • Participants
  • Parent commits a09f86e
  • Branches default
  • Tags v1.0

Comments (0)

Files changed (3)

File .hgignore

View file
  • Ignore whitespace
 syntax: glob
 
 target
+README.html

File LICENSE.txt

View file
  • Ignore whitespace
+Copyright (c) 2013 Miki Tebeka <miki@mikitebeka.com>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

File README.rst

View file
  • Ignore whitespace
+Zip Reader for Hadoop Streaming
+===============================
+This is a reader that will return (filename, line) key value pairs for a zip
+file in Hadoop streaming.
+
+Note that currently only the first file in the zip will be processed, if you
+want more - submit a pull request :)
+
+Usage
+=====
+
+::
+    
+    #!/bin/bash
+    # Unzip a file in HDFS
+
+    case $1 in
+        -h | --help ) echo "usage: $(basename $0) INDIR OUTDIR"; exit;;
+    esac
+
+    if [ $# -ne 2 ]; then
+        $0 -h
+        exit 1
+    fi
+
+    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
+        -libjars zipmapred-1.0-SNAPSHOT.jar \
+        -mapper /bin/cat \
+        -reducer /bin/cat \
+        -inputformat com.mikitebeka.mapred.ZipInputFormat \
+        -input $1 -output $2
+
+
+FAQ
+===
+
+| Q. Why not http://cotdp.com/2012/07/hadoop-processing-zip-files-in-mapreduce/?
+| A. It uses the old(?) `mapreduce` API and doesn't work with CDH4
+
+| Q. Where does this project live?
+| A.  https://bitbucket.org/tebeka/zipstream
+
+