CoVisualize - A Visualization of PlanetLab Usage

A Visualization of PlanetLab Usage
Part of the CoDeeN project

What
Is It?

CoVisualize provides some visualizations of PlanetLab usage. The visualizations show various metrics of PlanetLab activity, and are updated every 5 minutes. The underlying data is taken from the CoMon project.

The goal of CoVisualize is to be useful to several communities, while being visually interesting as well. All of the data shown in CoVisualize is meant to be useful for administering PlanetLab -- it quickly shows what experiments may be acting strangely, without providing an overwhelming level of detail. It also presents a sense of proportion, to see how the overall resources are being used. We have found that this kind of "feel" is hard to achieve just by examining the raw data in CoMon. Finally, CoVisualize is meant to provide some "eye candy" for PlanetLab, since most network monitoring is relatively visually unappealing.

CoVisualize also provides an auto-updating slide show.

Visualization
Format

CoVisualize uses the TreeMap visualization format developed by Professor Ben Schneiderman of UMD, and the TreeMap library developed by his research group. In this format, a rectangle is repeatedly subdivided to show individual elements, with the size of each area related to its importance. One fairly popular use of this technique is the "Map of the Market" from SmartMoney. Brent Chun previously used this technique for PlanetLab visualization.

Images

We currently have eight different visualizations, in three categories: resources, efficiency, and usage. The resources category refers to the bandwidth, CPU, and memory used by the various slices. Efficiency refers to how much bandwidth each slice is generating or consuming for the amount of CPU or memory used. Finally, usage refers to how different sites are using PlanetLab in terms of experiments, nodes, and number of slivers.

Each rectangle contains one label and two values. The label is is the name of the slice or site, and the two resources reflect the values used to create the visualization. The first value is what is used to determine the size of each rectangle. The second value is what determines the rectangle's color. The specifics of each visualization are given below, but the general trend is that red is worrisome, black is unsurprising, and green is desirable.

Resource
Images

These images show how slices are using the three primary resources on PlanetLab -- physical memory, CPU, and network bandwidth.

CPU -- this image shows the CPU consumption of each active slice on PlanetLab. The size of each rectangle reflects the slice's fraction of total CPU consumption on PlanetLab. Slices that consume less than 0.2% of the aggregate CPU are coalesced into a box labeled "Others". The size value is the total aggregate CPU used across PlanetLab, in the unit of CPU powers. For example, if a slice is running on 7 nodes and using 30% of each node, we say that it is using 2.1 total CPUs. The color value reflects how much above (red) or below (green) the current usage is compared to the weekly mean. We calculate the standard deviation of usage over the past week, and use this to select color. Two standard deviations results in a shade of 50% red or green, while four or more deviations results in a 100% red or green. No color is assigned to the "others" rectangle, since we do not keep track of the average of the coalesced slices.

Memory -- this image reflects the amount of physical memory used in aggregate by each slice. The size value is the number of gigabytes of memory, while the color value is the number of standard deviations away from the slice's weekly mean value. Like the CPU calculation, two standard deviations generates a 50% red or green, while four deviations or more yields 100% red or green. Small consumers are lumped into the "others" category. In general, the princeton_comon slice will often be the largest apparent consumer of memory, due to a memory pressure test it runs. This test very slowly measures the amount of physical memory a slice can obtain, so if memory pressure in PlanetLab is generally low, the memory test in princeton_comon will get lots of memory.

Bandwidth -- this image shows the aggregate bandwidth transmitted and received by the slice. The value is the 15-minute average, and is shown in megabits per second. The rectangle coloring uses the same method as the CPU and memory images. However, the rectangle size is actually related to the square root of the bandwidth, rather than being linearly related to the bandwidth usage. The reason for this choice is because bandwidth consumption varies much more widely than CPU or memory, and CoVisualize is meant to be used as an administrative tool. If rectangle sizes for bandwidth were shown linearly, very few slices would be displayed.

Efficiency
Images

Since PlanetLab is designed for networking research (as opposed to just computation), one would expect that PlanetLab slices to communicate using the network. These two images, then, try to show bandwidth usage as a function of CPU or memory consumption.

CPU vs BW -- this image shows CPU usage the same way that the CPU resource image does, with the size of each rectangle representing its proportion of overall CPU usage on PlanetLab. The colors, however, are a function of bandwidth, with the highest bandwidth usage shown as 100% green, and the lowest usage as 100% red. Since bandwidth varies so widely, the colors are assigned to the logarithm of the bandwidth value, rather than the linear bandwidth value. We do show a color for othe "others" rectangle, by aggregating all of the bandwidths used by the slices aggregated into this rectangle. In general, large red rectangles are undesirable, because they indicate a slice is using lots of CPU but not really communicating. These slices could be grid experiments being tested, or they could be programs experiencing spin loops.

Memory vs BW -- this image shows memory consumption versus bandwidth, using the same coloring approach as the comparison of CPU versus bandwidth. Again, large red rectangles are undesirable, because they indicate a slice is using lots of memory but not communicating. These cases, however, are less avoidable than their CPU counterparts, because memory consumption is generally not something that is dynamically adjusted based on load.

Usage
Images

Finally, CoVisualize attempts to illustrate the site-level usage of PlanetLab. Since sites donate machines to join PlanetLab, it is interesting to see how these resources are being used.

Slices -- we show the number of active slices running on PlanetLab, grouped by site. The size of each rectangle is the number of slices at the site, and the color is based on a label denoting site health. Green indicates that the site has at least two healthy nodes, while yellow indicates that the site's machines are impaired in some way (low disk, clock skew, DNS problems, etc.). The red rectangles indicate leeches -- the site is actively using PlanetLab, but is providing no operational machines.

Slivers -- this image is like the image of slices, but instead counts the total number of slivers for each site. So, the more nodes running each slice, the higher the sliver count. The color scheme is the same as the image of slices. Note that the number of slivers may have little to do with actual resource consumption. The resource images (CPU, Memory, Bandwidth) are better indicators of resource usage.

Nodes -- this image shows how many nodes each site has registered in the PlanetLab database. The more nodes a site has, the larger the rectangle for that site. The set of rectangle colors has increased to reflect the fact that the slices and slivers images only reflect active slices. For nodes, we can additionally have two other categories. The first is that the site has working nodes, but is not using PlanetLab at the moment. We call these sites donors, and use the color blue for their rectangles. The other is when a site has no working nodes and is not using PlanetLab. These are called the inactive sites, and are shown in black.

To summarize:

Green: site is in good standing
Yellow: site has only impaired nodes
Blue: site has working nodes but no usage
Black: site is down, but no usage
Red: site is down, but using others' resources

People

Sratha Saengsuwarn and Vivek Pai.

This research is funded in part by the National Science Foundation under Grant No. 0615062. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA or the National Science Foundation.

WhatIs It?

VisualizationFormat