CoDeploy

A Scalable Deployment Service for PlanetLab
Built on the CoDeeN content distribution network

What
Is It?


CoDeploy provides a means to efficiently and scalably distribute content from one source to many receivers. In practice, what this means for PlanetLab is that it allows you to push content to hundreds of PlanetLab nodes without having to consume lots of bandwidth at the source. In general, these techniques can be used for efficient peer-to-peer hosting of arbitrary content. CoDeploy uses a number of techniques to perform this distribution efficiently, such as

How Does
It Work?


CoDeploy is intended to be simple to use, requiring very little infrastructure and installation. It requires only two programs on the node initiating the push, and all of the other tools are already in place on PlanetLab. It works as follows:

  1. Using a Web server, you provide a directory containing the content to be deployed.
  2. You give CoDeploy a list of nodes to which you want the content pushed. This list can be built once and reused.
  3. CoDeploy generates version information for all of your content by checksumming the files, and this information is pushed to all of the nodes specified.
  4. Each node then compares the files with their versions on the origin server, and when versions differ, CoDeploy begins to download the file.
  5. CoDeploy consults a local daemon on each node that directs it to the closest good CoDeeN node, which CoDeploy uses for downloading the content.
  6. As content is downloaded, it is broken into many small pieces and replicated among all of the CoDeeN nodes. These fragments are reassembled as they are requested.

Components


CoDeploy consists of a number of components that work together to accomplish the tasks described above. The components, in brief, are described here.

Installing
CoDeploy


Follow these steps to install CoDeploy:

  1. Download codeploy.tar.gz, which contains the sources and makefiles
  2. Expand the file by running
    gunzip codeploy.tar.gz
    followed by
    tar xvf codeploy.tar
    You should now have a directory called codeploy
  3. Build the necessary files. In the codeploy directory, type either
    make
    or
    make -f Makefile.solaris
    as appropriate. The first form has been tested on RedHat 9, while the second is appropriate for Solaris
  4. Add the new directory to your path. If the directory, for example, is /home/tupac/codeploy, on tcsh, you execute the following commands:
    setenv PATH /home/tupac/codeploy:$PATH
    rehash
  5. Create a node list file, which contains all of the PlanetLab nodes you wish to use, one hostname per line. Assuming this file resides in /home/tupac/codeploy/node_list, you should execute
    setenv MQ_NODES /home/tupac/codeploy/node_list
  6. Specify your PlanetLab login account via the MQ_SLICE variable. If your account name is princeton_example, you should execute
    setenv MQ_SLICE princeton_example
    If you have specified nodes which you have not previously accessed, you should add the following line to your .ssh/config file
    StrictHostKeyChecking no
    Please note that this does change the security settings for ssh. If you are uncomfortable with this setting, you can manually log in to all of the nodes in your node list, which will cause the necessary actions to occur.
  7. Add all of the setenv commands shown above to your .cshrc file so that they are set on all future logins.
  8. You are now ready to use CoDeploy.
  9. Before running CoDeploy, please make sure you run 'ssh-agent' and feed your ssh key and passphrase to it. CoDeploy assumes it can ssh into the PlanetLab nodes without a passphrase.

Running
CoDeploy


CoDeploy is a fairly simple program to use. It takes three arguments, which are basically the local directory name, the URL corresponding to that directory, and the remote directory name.

For example, assume you have a local directory ~/public_html/program that is also available via your web server as http://www.example.edu/~tupac/program and that you want to put this directory (and all subdirectories) into your PlanetLab slice in the directory test. You would execute the following command on a single line

codeploy ~/public_html/program http://www.example.edu/~tupac/program test

If you want to optimize performance on the first deployment, specify the -a flag to codeploy. This flag will cause the directory to be tar'd and compressed. This optimization may be useful for the initial deployment. However, on future updates, when only a few files may be modified, using CoDeploy without flags is preferred.

Low-level details: You may find that CoDeploy creates temporary directories in your directory tree. This approach is necessary to handle several cases. When executable files are in your web tree, some web servers are configured to execute the file instead of transferring the file. CoDeploy uses the temp directory to make a non-executable copy of the file for download. Likewise, files that have a very recent modification time may be uncacheable by the CDN, so CoDeploy makes a copy of the file with an older modification time.

Other
Tools


The CoDeploy package also includes two other programs, multiquery and multicopy.  These can be used independently to perform other operations on the machines in the node list. Multiquery can be used to perform simple shell operations, and multicopy can be used to transfer files.

For example, to remotely create a directory and execute a program in the background, you can issue the following command:
multiquery 'mkdir results; directory/program &'
Note that the quotes around the commands indicate that all of them should be executed on the remote node

The multicopy command can be used to copy files to nodes, or to copy files from nodes. In this program, the @ parameter gets expanded to the name of a remote machine if specified alone. If specified with a colon suffix, it gets replaced with the slice name and machine name. The command
multicopy filename @:directory
copies the specified file to the specified remote directory on all nodes

To grab files from a remote subdirectory and copy them locally to directories corresponding to the remote machine names, execute the following commands:
mkdir newdir
multicopy newdir @
multicopy '@:directory/*' @
The first multicopy creates directories corresponding to the remote machine names. The second multicopy command copies the files from the remote machines to their corresponding local directories.

More details regarding the options for multiquery and multicopy can be found in the comments at the top of the multiquery.c source file. These options allow various behaviors, such as changing the degree of parallelism in the number of copies, the delay between launching successive queries, the timeout, etc.

Status


CoDeploy and all related tools are currently regarded as being in beta test. While we strive to ensure availability of the service 24 hours a day, 7 days a week, we may encounter outages when the service is being upgraded. CoDeploy does involve a number of new components, and while we are confident enough in their development to allow open usage, we may still encounter bugs. We ask for your understanding, and equally importantly, your feedback about the service.

People


KyoungSoo Park
Vivek Pai
Larry Peterson
with help from Aki Nakao

We may collectively be contacted at
princeton_codeen at slices.planet-lab.org