Status Page

Name & IP Address
Just the name and IP address of the PlanetLab node hosting a CoDeeN proxy/redirector. The redirector listens on port 3128.

Shows the status of the redirector. Normally, it indicates "good", meaning everything is minimally functional. The redirector on each node is queried via ssh and then using a local UDP transfer. If the node does not accept the ssh connection, a remote UDP transfer is used. The "ssh fails" status indicates that no ssh connection could be made. The "heartbeat fails" status indicates that the proxy did not respond to the UDP message. The "FDs exhausted" message indicates that the node has run out of file descriptors.

Nodes use various means to determine whether other nodes are viable candidates for redirection. The count of how many nodes this particular node "sees" indicates how many other nodes are considered viable redirectors. The label "no visibility" indicates that the node has no information about the rest of the network. Other nodes similarly use mechansisms to "see" this node, and some may choose to avoid it. This count is also given when other nodes are avoiding this node.

The uptime of the node and of the proxy are given, if they can be determined. If no information can be determined, the label "node down" is shown. If the node is live but no proxy can be found, the label "proxy down" is shown. Uptimes are shown as days (D), hours (H), minutes (M) or seconds (S).

Load / Sys %
This information is similar to what would be presented via the "top" program. The Load value is the maximum of the load averages for the past 1, 5, and 15 minutes. This value indicates how many active processes are competing for the CPU. The Sys % value is the percentage of time the node is spending running the OS.

This is a measure of how responsive the proxy and the underlying OS scheduler are. The CoDeeN code performs certain periodic activities every second, and this is triggered by a call from the underlying proxy. This callback should have a jitter of at most 100ms under normal circumstances, so we expect that the average timer callback will take place every 1000ms, with a maximum of 1100ms. Much larger numbers for either of these indicate delay-related problems, either with the scheduler or the proxy/module.

CoDeeN uses a UDP-based heartbeat (with acknowledgement) between the nodes to convey information and test various conditions. This shows the average round-trip times for these messages. The Avg value shows the average time for all nodes, as seen by this node. The Mine value is the average of what all other nodes see of this node. Round-trip times in excess of 360ms are currently capped at 360ms, so these values may be lower than their actual values. The per-node dump provides more detailed information.

These fields report the average lookup times for other CoDeeN node names using DNS and CoDNS. The failure rates are what percentage of lookups are taking more than 5 seconds, the retry value for the resolver.

Hourly Rate
The hourly rate numbers are a rough estimate of the number of requests per hour the node is experiencing. The Total number indicates all of the requests that this node is receiving. The Good number counts only those requests that were given service. Many requests are filtered by the security policies, and this difference is the difference between Total and Good. The Inside figure represents how many non-heartbeat requests were received at this node from users within PlanetLab-affiliated sites. The "Fwd'd" number indicates how many requests were forwarded to this node from other nodes. The Reg field indicates how many requests were for "regular" files, which excludes queries. The Query figure indicates the number of queries received by this node, excluding heartbeats. The gap between the Good count and the sum of Reg/Query is the number of heartbeats received by the node.

# Users
The Total count indicates the number of unique IP addresses which have communicated with this node over the past 24 hours. The Inside value is how many of those IP addresses are from Planetlab-affiliated clients. These counts currently include all of the nodes communicating with each other.

Status Page Details

The node uptime as reported by the /proc filesystem. This value is normally not used, except to report node uptimes when the proxy is down.

How many users have access the node directly in the past 24 hours.

How many users from within a PlanetLab-affiliated site have accessed the node within the past 24 hours

Each redirector periodically tests the number of available file descriptors. This value indicates the recent history of failures. A value of 0x0 indicates success. Any other value indicates that the node has suffered from file descriptor exhaustion in the past minute.

ProxUptm, NodeUptm
Proxy and node uptimes in seconds

The system load averages, similar to what is shown in "top"

A rough estimate of the number of requests per hour that this node is seeing.

Failure rate of regular DNS lookups

Lookup times for regular DNS lookups

Detailed breakdown of CoDNS performance

The minimum, average, and maximum times in milliseconds between the 1-second notification. A jitter of 100ms is expected and acceptable. The minimum value means very little - the average and maximum are much more important, and give some insight into how often CoDeeN is receiving the CPU. These values are recorded every 30 seconds.

Free disk space in GB.

These are the past 6 values (30 seconds each) for the percentage of CPU time being consumed by the operating system.

This row has one character for each CoDeeN node, and shows whether this node considers that node to be a viable redirector. If the node is considered unusable, an "X" is shown in its column. Otherwise, a "." indicates liveness. The reasons for a node being considered unavailable are multiple. Nodes that have not responded to heartbeats recently are considered dead. Likewise, nodes that are behind the others in terms of recent wget tests are also considered not viable. Finally, nodes that show a system CPU time above 95% are similarly skipped.

CoDeeN redirectors send out one heartbeat per second and record the acknowledgements. This row contains counts of how many acks have been missed or otherwise failed out of the last 32 acks. The count is base 32, with 0 being the lowest value, and w being the highest.

Acks that arrive too late to be useful are recorded but otherwise exempted from the liveness calculation. This value is again a count in base 32.

When a node runs out of file descriptors, it relays this information in its heartbeat acknowledgements, and this row contains the counts in base 32 of the number of acknowledgements indicating file descriptor exhaustion.

This row should always have zeroes, indicating that all nodes are running the same version.

Not used

This is the maximum of the load values for the past 1, 5, and 15 minutes. The value is rounded to the next highest integer and shown in base 32.

The maximum system CPU consumption for the past 6 readings is given. Values below 90% are rounded to the higher multiple of 10% and shown divided by 10. Values that are 90% or higher are shown starting with the letter 'a', incremented by one character for each percent. The letter 'f' represents 95%, and seems to be highly correlated with strange behavior from the node.

WgetProx, WgetTarg
In addition to the UDP heartbeats, the nodes also use wget to test the TCP and proxy connectivity of the nodes. The number of failures is recorded and shown in these fields. The WgetProx value shows how many times the wget from this node to the proxy on the other node failed. The WgetTarg shows how often the proxy on the other node was unable to fetch a page from a third node. In most cases, these values should be very close to zero. Values of 1 or 2 are tolerable, and any value above that probably indicates that the node is having problems.

The round-trip times between this node and other nodes using the UDP heartbeat. This is essentially the application-level ping latency, and is a decaying average. The number is a base 36 digit representing multiples of 10ms, so the range is 10-360ms. If the latency value exceeds 360ms, it is capped at that value for reporting purposes. The node keeps all values precisely internally.