Overview

Welcome to the Hadoop Starter Kit!

Releases:

Version 0.1, 2011-December-?? (planning)

Description:

Hadoop Starter Kit is an easy installer for Hadoop, HBase and Hive on a single Ubuntu (or any *nix) node.

This is an easy installer that comes with mostly pre-configured stuff. To install by hand visit these URLs:

License:

Hadoop-Starter-Kit is licensed under GNU General Public License v2. You may not use this software except in accordance with the license.

The download bundle contains packages with respective licenses:

Download:

This project does not contain much source. Download a distribution from Downloads section to get started.

Contact:

Send your feedback to kumar.shantanu@gmail.com or @kumarshantanu

Discussion: http://groups.google.com/group/bitumenframework

How to setup:

  1. Expand the compressed bundle in the home directory
$ tar xzvf hadoop-starter-kit-0.1-beta3.tar.gz
$ cd hadoop-starter-kit-0.1-beta3

You will find an 'app' directory that you may like to move to the home directory.

$ mv app ~/
  1. Install Sun JDK 1.6.x if not already installed

  2. Create a symlink ~/app/default/jdk pointing to the JDK

  3. Create symlinks in ~/app/bin/ pointing to Java binaries, such as java, javac, jps etc

  4. Add ~/app/bin to PATH and set the environment variables (in ~/.bashrc on Ubuntu, ~/.bash\_profile on Mac):

export JAVA_HOME=~/app/default/jdk
export HADOOP_HOME=~/app/default/hadoop
export HBASE_HOME=~/app/default/hbase
export HIVE_HOME=~/app/default/hive
export PATH=~/app/bin:$PATH
  1. Disable IPv6 on Ubuntu (or any Linux) -- add the following to /etc/sysctl.conf file:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Restart network services or the computer. To verify whether IPv6 is disabled, run this:

$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

This step is not required on Mac OS.

  1. Make sure localhost and hostname both point to the same IP address, e.g. 127.0.0.1 in the /etc/hosts file - I noticed Ubuntu 11.04 gives them slightly different IP addresses, i.e. 127.0.1.1 for hostname. Edit /etc/hosts file as necessary.

This step is not required on Mac OS.

  1. Install OpenSSH and start SSH daemon if not already done. To verify whether to login

  2. Setup password-less, passphrase-less SSH login if not already done

$ ssh -t rsa # simply press Enter when prompted for passphrase
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  1. Run the setup script and re/initialize
$ cd ~
$ /path/to/hsetup # to be run only once (in hadoop-starter-kit)
$ hinit           # stop Hadoop/HBase, delete all data and start over

How to run:

  1. To start Hadoop and HBase:
$ hserv 1
  1. To stop Hadoop and HBase:
$ hserv 0
  1. To run Hadoop (example):
$ hadoop fs -ls /
  1. To run HBase (example):
$ hbase shell
hbase(main):001:0> list
  1. To run Hive client (example):
$ hive
hive> show tables;
  1. To run Hive server:
$ hive --service hiveserver
(press Ctrl+C to stop)
  1. To run Hive Web Interface:
$ hive --service hwi
(Now visit URL http://localhost:9999/hwi/ in web browser)
(press Ctrl+C to stop)