Status Page
Name & IP Address
Just the name and IP address of the PlanetLab node hosting a CoDeeN
proxy/redirector. The redirector listens on port 3128.
Status
Shows the status of the redirector. Normally, it indicates "good",
meaning everything is minimally functional. The redirector on each
node is queried via ssh and then using a local UDP transfer. If the
node does not accept the ssh connection, a remote UDP transfer is
used. The "ssh fails" status indicates that no ssh connection could be
made. The "heartbeat fails" status indicates that the proxy did not
respond to the UDP message. The "FDs exhausted" message indicates that
the node has run out of file descriptors.
Visibility
Nodes use various means to determine whether other nodes are viable
candidates for redirection. The count of how many nodes this
particular node "sees" indicates how many other nodes are considered
viable redirectors. The label "no visibility" indicates that the node
has no information about the rest of the network. Other nodes
similarly use mechansisms to "see" this node, and some may choose to
avoid it. This count is also given when other nodes are avoiding this
node.
Uptime
The uptime of the node and of the proxy are given, if they can be
determined. If no information can be determined, the label "node
down" is shown. If the node is live but no proxy can be found, the
label "proxy down" is shown. Uptimes are shown as days (D), hours (H),
minutes (M) or seconds (S).
Load / Sys %
This information is similar to what would be presented via the "top"
program. The Load value is the maximum of the load averages for the
past 1, 5, and 15 minutes. This value indicates how many active
processes are competing for the CPU. The Sys % value is the percentage
of time the node is spending running the OS.
Timer
This is a measure of how responsive the proxy and the underlying OS
scheduler are. The CoDeeN code performs certain periodic activities
every second, and this is triggered by a call from the underlying
proxy. This callback should have a jitter of at most 100ms under
normal circumstances, so we expect that the average timer callback
will take place every 1000ms, with a maximum of 1100ms. Much larger
numbers for either of these indicate delay-related problems, either
with the scheduler or the proxy/module.
RTT
CoDeeN uses a UDP-based heartbeat (with acknowledgement) between the
nodes to convey information and test various conditions. This shows
the average round-trip times for these messages. The Avg value shows
the average time for all nodes, as seen by this node. The Mine value
is the average of what all other nodes see of this node. Round-trip
times in excess of 360ms are currently capped at 360ms, so these
values may be lower than their actual values. The per-node dump
provides more detailed information.
DNS and CoDNS
These fields report the average lookup times for other CoDeeN node
names using DNS and CoDNS. The failure rates are what percentage of
lookups are taking more than 5 seconds, the retry value for the
resolver.
Hourly Rate
The hourly rate numbers are a rough estimate of the number of requests
per hour the node is experiencing. The Total number indicates all of
the requests that this node is receiving. The Good number counts only
those requests that were given service. Many requests are filtered by
the security policies, and this difference is the difference between
Total and Good. The Inside figure represents how many non-heartbeat
requests were received at this node from users within
PlanetLab-affiliated sites. The "Fwd'd" number indicates how many
requests were forwarded to this node from other nodes. The Reg field
indicates how many requests were for "regular" files, which excludes
queries. The Query figure indicates the number of queries received by
this node, excluding heartbeats. The gap between the Good count and
the sum of Reg/Query is the number of heartbeats received by the node.
# Users
The Total count indicates the number of unique IP addresses which have
communicated with this node over the past 24 hours. The Inside value is
how many of those IP addresses are from Planetlab-affiliated clients. These
counts currently include all of the nodes communicating with each other.
Status Page Details
ProcUptm
The node uptime as reported by the /proc filesystem. This value is
normally not used, except to report node uptimes when the proxy is
down.
UsersCnt
How many users have access the node directly in the past 24 hours.
Insiders
How many users from within a PlanetLab-affiliated site have accessed
the node within the past 24 hours
FdTstHst
Each redirector periodically tests the number of available file
descriptors. This value indicates the recent history of failures. A
value of 0x0 indicates success. Any other value indicates that the
node has suffered from file descriptor exhaustion in the past
minute.
ProxUptm, NodeUptm
Proxy and node uptimes in seconds
LoadAvgs
The system load averages, similar to what is shown in "top"
ReqsHrly
A rough estimate of the number of requests per hour that this node is
seeing.
DNSFails
Failure rate of regular DNS lookups
DNSTimes
Lookup times for regular DNS lookups
CoDNSDbg
Detailed breakdown of CoDNS performance
TimerInt
The minimum, average, and maximum times in milliseconds between the
1-second notification. A jitter of 100ms is expected and acceptable.
The minimum value means very little - the average and maximum are much
more important, and give some insight into how often CoDeeN is
receiving the CPU. These values are recorded every 30 seconds.
DiskFree
Free disk space in GB.
SysPtCPU
These are the past 6 values (30 seconds each) for the percentage of
CPU time being consumed by the operating system.
Liveness
This row has one character for each CoDeeN node, and shows whether
this node considers that node to be a viable redirector. If the node
is considered unusable, an "X" is shown in its column. Otherwise, a
"." indicates liveness. The reasons for a node being considered
unavailable are multiple. Nodes that have not responded to heartbeats
recently are considered dead. Likewise, nodes that are behind the
others in terms of recent wget tests are also considered not
viable. Finally, nodes that show a system CPU time above 95% are
similarly skipped.
MissAcks
CoDeeN redirectors send out one heartbeat per second and record the
acknowledgements. This row contains counts of how many acks have been
missed or otherwise failed out of the last 32 acks. The count is base
32, with 0 being the lowest value, and w being the highest.
LateAcks
Acks that arrive too late to be useful are recorded but otherwise
exempted from the liveness calculation. This value is again a count in
base 32.
NoFdAcks
When a node runs out of file descriptors, it relays this information
in its heartbeat acknowledgements, and this row contains the counts in
base 32 of the number of acknowledgements indicating file descriptor
exhaustion.
VersProb
This row should always have zeroes, indicating that all nodes are
running the same version.
DtSdAcks
Not used
MaxLoads
This is the maximum of the load values for the past 1, 5, and 15
minutes. The value is rounded to the next highest integer and shown
in base 32.
SysMxCPU
The maximum system CPU consumption for the past 6 readings is
given. Values below 90% are rounded to the higher multiple of 10% and
shown divided by 10. Values that are 90% or higher are shown starting
with the letter 'a', incremented by one character for each
percent. The letter 'f' represents 95%, and seems to be highly
correlated with strange behavior from the node.
WgetProx, WgetTarg
In addition to the UDP heartbeats, the nodes also use wget to test the
TCP and proxy connectivity of the nodes. The number of failures is
recorded and shown in these fields. The WgetProx value shows how many
times the wget from this node to the proxy on the other node
failed. The WgetTarg shows how often the proxy on the other node was
unable to fetch a page from a third node. In most cases, these values
should be very close to zero. Values of 1 or 2 are tolerable, and any
value above that probably indicates that the node is having
problems.
UdpHbRtt
The round-trip times between this node and other nodes using the UDP
heartbeat. This is essentially the application-level ping latency, and
is a decaying average. The number is a base 36 digit representing
multiples of 10ms, so the range is 10-360ms. If the latency value
exceeds 360ms, it is capped at that value for reporting purposes. The
node keeps all values precisely internally.