CoDeploy - A Scalable Deployment Service for PlanetLab

A Scalable Deployment Service for PlanetLab
Built on the CoDeeN content distribution network

What
Is It?

CoDeploy provides a means to efficiently and scalably distribute content from one source to many receivers. In practice, what this means for PlanetLab is that it allows you to push content to hundreds of PlanetLab nodes without having to consume lots of bandwidth at the source. In general, these techniques can be used for efficient peer-to-peer hosting of arbitrary content. CoDeploy uses a number of techniques to perform this distribution efficiently, such as

using HTTP caching via the CoDeeN content distribution network
splitting large files into multiple pieces so that even files that are hundreds of megabytes can be handled efficiently
locating suitable CoDeeN nodes via a "peer review" system

How Does
It Work?

CoDeploy is intended to be simple to use, requiring very little infrastructure and installation. It requires only two programs on the node initiating the push, and all of the other tools are already in place on PlanetLab. It works as follows:

Using a Web server, you provide a directory containing the content to be deployed.
You give CoDeploy a list of nodes to which you want the content pushed. This list can be built once and reused.
CoDeploy generates version information for all of your content by checksumming the files, and this information is pushed to all of the nodes specified.
Each node then compares the files with their versions on the origin server, and when versions differ, CoDeploy begins to download the file.
CoDeploy consults a local daemon on each node that directs it to the closest good CoDeeN node, which CoDeploy uses for downloading the content.
As content is downloaded, it is broken into many small pieces and replicated among all of the CoDeeN nodes. These fragments are reassembled as they are requested.

Components

CoDeploy consists of a number of components that work together to accomplish the tasks described above. The components, in brief, are described here.

Script generator - CoDeploy operates by means of dynamically-generated scripts. When you invoke CoDeploy to push a directory, the script generator recursively walks the directory, making note of all subdirectories, all files, and the checksums (via md5sum) of the files. This information is used to create a perl script that can then be executed at the PlanetLab nodes.
Parallel copy/execute - Once the script has been generated, it is copied to all of the target nodes using a parallel copy tool. The script is then executed at all nodes in parallel. The script makes the necessary directories and compares the checksums of the existing files with those on the origin server. Any files that are not up to date are marked for download using CoDeeN. Using CoDeeN, multiple simultaneous downloads do not swamp the origin server.
Peer-review service - Since files are obtained from CoDeeN nodes, we need some mechanism to monitor node health. For this purpose, the CoDeeN nodes provide a peer-review service, evaluating each others' performance. Each node collects opinions of its status by receiving messages from other nodes. Each node also reports its own reputatation, which is then used to weight the opinion reports.
CoDeeN locator - Since CoDeeN does not run on every PlanetLab node, CoDeploy needs some mechanism to find a nearby "good" CoDeeN node. The locator service collects status reports from known CoDeeN nodes, and combining this information with proximity information, it tells CoDeploy which CoDeeN node to use.
Large-file support - Each CoDeeN node runs a special user agent that accepts download requests and then generates a series of requests for smaller ranges of the file. These requests are then spread across the CoDeeN nodes, which independently request the pieces from the origin server if needed. The user agent automatically reassembles the responses and sends them to the client as though only a single request had been made. In this manner, even files that are hundreds of megabytes in size can be efficiently cached in CoDeeN.

Installing
CoDeploy

Follow these steps to install CoDeploy:

Download codeploy.tar.gz, which contains the sources and makefiles
Expand the file by running
gunzip codeploy.tar.gz
followed by
tar xvf codeploy.tar
You should now have a directory called codeploy
Build the necessary files. In the codeploy directory, type either
make
or
make -f Makefile.solaris
as appropriate. The first form has been tested on RedHat 9, while the second is appropriate for Solaris
Add the new directory to your path. If the directory, for example, is /home/tupac/codeploy, on tcsh, you execute the following commands:
setenv PATH /home/tupac/codeploy:$PATH
rehash
Create a node list file, which contains all of the PlanetLab nodes you wish to use, one hostname per line. Assuming this file resides in /home/tupac/codeploy/node_list, you should execute
setenv MQ_NODES /home/tupac/codeploy/node_list
Specify your PlanetLab login account via the MQ_SLICE variable. If your account name is princeton_example, you should execute
setenv MQ_SLICE princeton_example
If you have specified nodes which you have not previously accessed, you should add the following line to your .ssh/config file
StrictHostKeyChecking no
Please note that this does change the security settings for ssh. If you are uncomfortable with this setting, you can manually log in to all of the nodes in your node list, which will cause the necessary actions to occur.
Add all of the setenv commands shown above to your .cshrc file so that they are set on all future logins.
You are now ready to use CoDeploy.
Before running CoDeploy, please make sure you run 'ssh-agent' and feed your ssh key and passphrase to it. CoDeploy assumes it can ssh into the PlanetLab nodes without a passphrase.

Running
CoDeploy

CoDeploy is a fairly simple program to use. It takes three arguments, which are basically the local directory name, the URL corresponding to that directory, and the remote directory name.

For example, assume you have a local directory ~/public_html/program that is also available via your web server as http://www.example.edu/~tupac/program and that you want to put this directory (and all subdirectories) into your PlanetLab slice in the directory test. You would execute the following command on a single line

codeploy ~/public_html/program http://www.example.edu/~tupac/program test

If you want to optimize performance on the first deployment, specify the -a flag to codeploy. This flag will cause the directory to be tar'd and compressed. This optimization may be useful for the initial deployment. However, on future updates, when only a few files may be modified, using CoDeploy without flags is preferred.

Low-level details: You may find that CoDeploy creates temporary directories in your directory tree. This approach is necessary to handle several cases. When executable files are in your web tree, some web servers are configured to execute the file instead of transferring the file. CoDeploy uses the temp directory to make a non-executable copy of the file for download. Likewise, files that have a very recent modification time may be uncacheable by the CDN, so CoDeploy makes a copy of the file with an older modification time.

Other
Tools

The CoDeploy package also includes two other programs, multiquery and multicopy. These can be used independently to perform other operations on the machines in the node list. Multiquery can be used to perform simple shell operations, and multicopy can be used to transfer files.

For example, to remotely create a directory and execute a program in the background, you can issue the following command:
multiquery 'mkdir results; directory/program &'
Note that the quotes around the commands indicate that all of them should be executed on the remote node

The multicopy command can be used to copy files to nodes, or to copy files from nodes. In this program, the @ parameter gets expanded to the name of a remote machine if specified alone. If specified with a colon suffix, it gets replaced with the slice name and machine name. The command
multicopy filename @:directory
copies the specified file to the specified remote directory on all nodes

To grab files from a remote subdirectory and copy them locally to directories corresponding to the remote machine names, execute the following commands:
mkdir newdir multicopy newdir @ multicopy '@:directory/*' @ The first multicopy creates directories corresponding to the remote machine names. The second multicopy command copies the files from the remote machines to their corresponding local directories.

More details regarding the options for multiquery and multicopy can be found in the comments at the top of the multiquery.c source file. These options allow various behaviors, such as changing the degree of parallelism in the number of copies, the delay between launching successive queries, the timeout, etc.

Status

CoDeploy and all related tools are currently regarded as being in beta test. While we strive to ensure availability of the service 24 hours a day, 7 days a week, we may encounter outages when the service is being upgraded. CoDeploy does involve a number of new components, and while we are confident enough in their development to allow open usage, we may still encounter bugs. We ask for your understanding, and equally importantly, your feedback about the service.

People

KyoungSoo Park
Vivek Pai
Larry Peterson
with help from Aki Nakao

We may collectively be contacted at
princeton_codeen at slices.planet-lab.org

WhatIs It?

How DoesIt Work?