CoDNS

Making DNS lookups faster, more reliable, and more predictable

What
Is It?


The CoDNS service provides cooperative name lookups to provide a faster, more reliable DNS lookup service. It is a thin wrapper for name lookup which dramatically reduces the client side latency while consuming minimal resources. We have found that nameservers often experience local failures, resulting in clients incurring many seconds of delay, even for cached records. In that case, a typical 50 ms lookup suddenly increases by a factor of 100 or more. CoDNS solves this problem by redirecting the lookup query to a healthy peer node when the local nameserver starts to reveal failures. This masks off the long latency in name lookups caused by local failures and provides consistently fast, reliable response to virtually all name lookup requests.

Who Should
Use CoDNS?


CoDNS basically gives benefit to anyone who wants more reliable name lookup service. Specifically, the following applications get the most benefit:

We have a couple of interesting incidents while operating CoDNS.

Name
Lookup
Failures


Previous studies show some evidence about why today's DNS performance is still not satisfactory. However, all of them focus on the problems in the server-side DNS infrastruture, overlooking the possibility of the client-side local failures. What we have found is that the problems also lie in the LAN enviroment where clients and their nameservers both reside. Given the fact that the cache hit ratio in a local nameserver is high (50 - 80%), only a small portion of the requests are being sent to the remote servers. Most of other queries are satisfied locally, so even a simple local problem seriously affects client-perceived latency. Because a typical nameserver uses UDP as its default communication method, even a single packet loss/drop within the LAN environment will ultimately trigger a retransmission of the query from the resolver, adding five seconds of delay (default retransmission timeout of a BIND resolver) to the client. Possible local problems include

  1. Packet Loss in the LAN environment.
  2. Temporary local nameserver overloading.
  3. Cron jobs or other heavy processes running on a same machine.
  4. Maintenance mistakes.

The current local failure rate of PlanetLab nodes can be found here.

How Does
It Work?


CoDNS avoids local problems by redirecting the queries to a healthy peer node when the nameserver is not responding well. The CoDNS daemon is running on the same machine as the client, and name lookup requests from the client are delivered to the daemon through a loopback connection. CoDNS tries to resolve a name with a local nameserver first, but if it does not receive an answer within a certain timeout, it forwards the query to a responsive peer node and doubles the timeout value. The original timeout value is dynamically adjusted according to the staus of local nameserver's response. When the second timeout expires, it picks another peer node for the name lookup. CoDNS repeats this procedure until it gets an answer from either side and delivers whichever answer it gets first to the user.

CoDNS consists of one nonblocking master process and many blocking slave processes which resolve names using gethostbyname( ). The master process is in charge of the following:

Overhead


CoDNS incurs minimal overhead in terms of resources and the number of remote queries. When the local nameserver is functioning well, no remote queries are being sent. Only when the local server shows some sort of problem, CoDNS starts to forward the queries to remote peer nodes. While it depends on the setting of nameservers, in the typical case of functioning nameservers, only 1-2% of all requests are sent remotely, while 98-99% of queries are resolved locally. The timeout for remote queries gets exponentially backed off for each instance, and each CoDNS node limits the number of remote queries at any given point, preventing potential attacks.

Before
& After


LDNS latency CoDNS latency
These graphs show the average latency each minute for live traffic through one CoDeeN proxy for one day. CoDNS effectively removes the long latencies induced by the local nameserver on the left graph.

How To
Use CoDNS


1. For PlanetLab users:

Using CoDNS is very simple. CoDNS is now running on all nodes at PlanetLab. In order to benefit from CoDNS service on a PlanetLab node, all you need to do is to include the provided header file and replace gethostbyname( ) with CoDNSGetHostByNameSync( ). Then, compile with the provided source file. Currently, only synchronous (blocking) version is provided, but we are going to add an interface for asynchronous (nonblocking) lookup near in the future.

Sample code :

#include <stdio.h>
#include "codnsresolv.h"

int main()
{
  struct in_addr result;
  char *name = "www.princeton.edu";

  if (CoDNSGetHostByNameSync(name, &result) < 0) {
    fprintf(stderr, "CoDNS failed\n");
  } else {
    printf("%s : %s\n", name, inet_ntoa(result));
  }
  return 0;
}

2. For Linux/MacOS users:

Please install CoDNS on your desktop. You can transparently use the CoDNS service without any change to your exisiting software!

CoDNS 0.52 (11/27/2006) -- Latest version

CoDNS 0.51 (1/31/2006)

Older version: CoDNS 0.50 (11/9/2005)

Telnet Interface
To CoDNS


You can test whether CoDNS is running or not by doing a text query. CoDNS uses TCP port 4119, so by telnetting to port 4119 and typing the domain name, you will see if CoDNS is working fine

Example :

[princeton_codeen@planetlab-1 princeton_codeen]$ telnet localhost 4119
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
www.princeton.edu      <- domain name
128.112.128.15             <- IP address
Connection closed by foreign host.

Please let us know if you find a node which is not running the CoDNS daemon.

Status


CoDNS is currently regarded as being in beta test. While we strive to ensure availability of the service 24 hours a day, 7 days a week, we may encounter outages when the service is being upgraded. Even in that case, the provided interface calls the existing gethostbyname( ) while CoDNS is temporarily unavailable. CoDNS does involve a number of new components, and while we are confident enough in their development to allow open usage, we may still encounter bugs. We ask for your understanding, and equally importantly, your feedback about the service.

Publications


CoDNS: Improving DNS Performance and Reliability via Cooperative Lookups
KyoungSoo Park, Vivek S. Pai, Larry Peterson and Zhe Wang
In Proceedings of the Sixth Symposium on Operating Systems Design and Implementation(OSDI '04)

CoDNS: Masking DNS Delays via Cooperative Lookups
KyoungSoo Park, Zhe Wang, Vivek S. Pai, and Larry Peterson
Technical Report, superceded by the OSDI paper.

People


KyoungSoo Park
Vivek Pai
Larry Peterson

We may collectively be contacted at princeton_codeen at slices.planet-lab.org