Run-time dependencies:
Build-time dependencies:
To run the unit tests you’ll also need:
The run-time dependencies need to be installed on all cluster nodes. As such, you will need to either install the software to all the nodes or install it to a shared volume. On the other hand, Ant and JUnit only need to be installed on the node you use to build Seal.
We recommend installing these tools/libraries as packaged by your favourite distribution.
Note
Seal has several depencies, and installing them from scratch can be a lengthy process; arm yourself with patience. We encourage you to report any installation problems so we can use that information to make Seal easier to install.
If you haven’t done so already, set up your Hadoop cluster. Please refer to the Hadoop documentation for your chosen distribution:
Seal has been developed with Apache Hadoop 0.20, but we have also tested it with Cloudera CDH3.
Download the latest version of Pydoop from here: http://sourceforge.net/projects/pydoop/files/. Set the HADOOP_HOME environment variable so that it points to where the Hadoop tarball was extracted:
export HADOOP_HOME=<path to Hadoop directory>
Then, in the same shell:
tar xzf pydoop-0.4.0_rc2.tar.gz
cd pydoop-0.4.0_rc2
python setup.py build
You’ll need to tell the Pydoop setup program where to find these components. Please refer to the Pydoop installation documentation for details.
Now extract and build Pydoop:
tar xzf pydoop-0.4.0_rc2.tar.gz
cd pydoop-0.4.0_rc2
python setup.py build
You need to decide where to install Pydoop. Remember that it needs to be accessible by all the cluster nodes running Seal tasks. We recommend installing to a shared volume, except for medium-large clusters (more than 100 nodes) where local installation may be necessary.
If your user’s home directory is accessible on all cluster nodes, then installing it there may be a good idea:
python setup.py install --user
Otherwise, to install to a specific path:
python setup.py install --home <path>
For a system-wide (local) installation:
sudo python setup.py install --skip-build
Note
If you had to export HADOOP_HOME to build Pydoop, make sure the variable is also set when you call setup.py install. The Pydoop documentation has more details regarding its installation.
Seal needs the Hadoop jars to compile. Tell the build script where to find them by setting the HADOOP_HOME environment variable.
If you installed Hadoop from a tarball, set HADOOP_HOME to point to the extracted copy of the archive. For instance:
export HADOOP_HOME=/home/me/hadoop-0.20
If you installed Hadoop from the Cloudera packages, then the correct path depends on which distribution you’re using. Nevertheless, the correct path for HADOOP_HOME is probably:
export HADOOP_HOME=/usr/lib/hadoop
The build process expects to find the Hadoop jars in the ${HADOOP_HOME} and ${HADOOP_HOME}/lib directories.
Seal includes Java, Python and C components that need to be built. A Makefile is provided that builds all components. Simply go into the root Seal source directory and run:
make
This will create the archive build/seal-<release>.tar.gz containing all Seal components. Go to the section on Deploying to see what to do with it.
You can find the documentation for Seal at http://biodoop-seal.sourceforge.net/.
If however you want to build yourself a local copy, you can do so in three steps:
You’ll find the documentation in HTML in docs/_build/html/index.html.
See if your Linux distribution includes a packaged version of Sphinx (if probably does). Alternatively, if you’re using Python Setuptools, you can use Easy Install:
easy_install -U Sphinx
Finally, you can install manually by following the instructions on the Sphinx web site: http://sphinx.pocoo.org/.