Table Of Contents

Previous topic

Installation

Next topic

Installation - Gentoo

Get Seal

Contributors

Seal is developed by: crs4 logo

And generously hosted by: Get SEAL at SourceForge.net. Fast, secure and Free Open Source software downloads

Installation - Ubuntu

Install Dependencies

You’ll need to activate the “partner” repositories in /etc/apt/sources.list by uncommenting the lines below:

deb http://archive.canonical.com/ubuntu maverick partner
deb-src http://archive.canonical.com/ubuntu maverick partner

Substitute maverick with your Ubuntu release name. These sources are required for the Java package.

Now update your package list and install all the required packages:

sudo apt-get update

sudo apt-get install sun-java6-jdk python protobuf-compiler \
libprotobuf6 libprotoc6 python-protobuf ant ant-optional g++ \
libboost-python-dev

The run-time dependencies need to be installed on all cluster nodes. As such, you will need to either install the software to all the nodes or install it to a shared volume. On the other hand, the build-time dependencies [1] only need to be installed on the node you use to build Seal. JUnit4 is only needed to run the unit tests if you’re using a version of Hadoop older than 0.20.203.

Install Hadoop

If you haven’t done so already, set up your Hadoop cluster. Please refer to the Hadoop documentation for your chosen distribution:

Seal has been developed with Apache Hadoop 0.20, but we have also tested it with Cloudera CDH3.

Build Pydoop

With Tarball distributions of Hadoop (Apache and Cloudera)

Download the latest version of Pydoop from here: http://sourceforge.net/projects/pydoop/files/. Set the HADOOP_HOME environment variable so that it points to where the Hadoop tarball was extracted:

export HADOOP_HOME=<path to Hadoop directory>

Then, in the same shell:

tar xzf pydoop-0.4.0_rc2.tar.gz
cd pydoop-0.4.0_rc2
python setup.py build

With Packaged distributions of Cloudera Hadoop

Download the latest version of Pydoop from here: http://sourceforge.net/projects/pydoop/files/. We assume the Cloudera package repository is already in your sources (see Installing CDH3 on Ubuntu Systems). You’ll need to install the Hadoop source code and libhdfs:

sudo apt-get install hadoop-source libhdfs0 libhdfs0-dev

Now extract and build Pydoop:

tar xzf pydoop-0.4.0_rc2.tar.gz
cd pydoop-0.4.0_rc2
python setup.py build

Install Pydoop

You need to decide where to install Pydoop. Remember that it needs to be accessible by all the cluster nodes running Seal tasks. We recommend installing to a shared volume, except for medium-large clusters (more than 100 nodes) where local installation may be necessary.

If your user’s home directory is accessible on all cluster nodes, then installing it there may be a good idea:

python setup.py install --user

Otherwise, to install to a specific path:

python setup.py install --home <path>

For a system-wide (local) installation:

sudo python setup.py install --skip-build

Note

If you had to export HADOOP_HOME to build Pydoop, make sure the variable is also set when you call setup.py install. The Pydoop documentation has more details regarding its installation.

Build Seal

Seal needs the Hadoop jars to compile. Tell the build script where to find them by setting the HADOOP_HOME environment variable.

If you installed Hadoop from a tarball, set HADOOP_HOME to point to the extracted copy of the archive. For instance:

export HADOOP_HOME=/home/me/hadoop-0.20

If you installed Hadoop from the Cloudera packages, set HADOOP_HOME like this:

export HADOOP_HOME=/usr/lib/hadoop

The build process expects to find the Hadoop jars in the ${HADOOP_HOME} and ${HADOOP_HOME}/lib directories.

Seal includes Java, Python and C components that need to be built. A Makefile is provided that builds all components. Simply go into the root Seal source directory and run:

make

This will create the archive build/seal-<release>.tar.gz containing all Seal components. Go to the section on Deploying to see what to do with it.

Creating the documentation

You can find the documentation for Seal at http://biodoop-seal.sourceforge.net/.

If however you want to build yourself a local copy, you can do so in three steps:

  1. install Sphinx: sudo apt-get install python-sphinx
  2. go to the Seal directory
  3. run: make doc

You’ll find the documentation in HTML in docs/_build/html/index.html.

[1]The following packages should only be required at build-time: protobuf-compiler libprotoc6 ant ant-optional g++