Network protocols — a DNS query#

By the end of this chapter you will be able to build and parse a DNS query header and question section using pack and unpack — and take the same approach to any RFC-defined protocol.

A “network protocol” is, in the end, a sequence of byte-exact fields described in English prose. If you can read the field table, you can write the template. This chapter walks one concrete example — the DNS query packet — from the RFC wire diagram to a fully working pair of encode / decode subroutines.

The problem#

We want to ask a DNS server for the A record of example.com. The UDP payload we send is a DNS message (RFC 1035), consisting of:

  1. A 12-byte header.

  2. A question section — the domain we are asking about, and the record type we want.

(A real client also parses the answer section in the response. We stop at encoding and showing a sketch of the decode.)

The header#

From RFC 1035 section 4.1.1, the header is six consecutive 16-bit big-endian fields:

Offset

Field

Meaning

0

ID

Arbitrary 16-bit identifier we choose

2

Flags

Opcode, RD bit, response flags, rcode

4

QDCOUNT

Number of entries in question section

6

ANCOUNT

Number of resource records in answer

8

NSCOUNT

Authority records

10

ARCOUNT

Additional records

Six unsigned big-endian shorts means six n directives:

sub dns_header {
    my (%opts) = @_;
    pack "n6",
        $opts{id}      // 0,
        $opts{flags}   // 0,
        $opts{qd}      // 0,
        $opts{an}      // 0,
        $opts{ns}      // 0,
        $opts{ar}      // 0;
}

That is 12 bytes, exactly as the spec requires. n6 is shorthand for n n n n n n.

The question section#

A question is:

  1. A name, encoded as a sequence of length-prefixed labels terminated by a zero-length label.

  2. A 16-bit QTYPE (1 = A record).

  3. A 16-bit QCLASS (1 = IN, Internet).

For example.com the name encodes as:

\x07 e x a m p l e   \x03 c o m   \x00

Each label begins with a length byte; the whole name ends with a zero-length label. Two things to notice: the length is one byte (C), not two; and there is a trailing NUL but it is not exactly what Z produces — it terminates the list of labels, not a single string.

We can build this by hand from a domain name:

sub encode_name {
    my ($name) = @_;
    my $out = "";
    for my $label (split /\./, $name) {
        die "label too long" if length($label) > 63;
        $out .= pack "C/a*", $label;
    }
    $out .= "\0";                      # zero-length terminator
    return $out;
}

Two directives in the loop body: C/a* — a single-byte length followed by the label bytes, computed automatically (see the grouping-and-counts chapter for the / form). After the loop, a literal "\0" terminates the list.

With encode_name in place, the whole question section is two concatenations and one pack:

sub dns_question {
    my ($name, $qtype, $qclass) = @_;
    return encode_name($name) . pack "n n", $qtype, $qclass;
}

Assembling the packet#

use constant {
    QR_QUERY    => 0,
    OPCODE_QUERY=> 0,
    RD          => 1 << 8,          # recursion desired
    TYPE_A      => 1,
    CLASS_IN    => 1,
};

sub dns_query_for_A {
    my ($name) = @_;
    my $id    = int rand 65536;
    my $flags = RD;                  # standard query, recursion desired

    my $pkt = dns_header(
        id    => $id,
        flags => $flags,
        qd    => 1,                  # one question
    );
    $pkt .= dns_question($name, TYPE_A, CLASS_IN);

    return ($id, $pkt);
}

my ($id, $pkt) = dns_query_for_A("example.com");
# send $pkt over a UDP socket to port 53

29 bytes total: 12 header + 13 for the name (\x07example\x03com\x00 is 13) + 2 + 2 for QTYPE and QCLASS. Run length $pkt to confirm.

Parsing the response header#

The response has the same header shape — only the flag bits change. Decoding is the reverse template:

sub parse_dns_header {
    my ($buf) = @_;
    my ($id, $flags, $qd, $an, $ns, $ar) = unpack "n6", $buf;

    my %hdr = (
        id    => $id,
        qr    => ($flags >> 15) & 1,
        op    => ($flags >> 11) & 0x0f,
        aa    => ($flags >> 10) & 1,
        tc    => ($flags >> 9)  & 1,
        rd    => ($flags >> 8)  & 1,
        ra    => ($flags >> 7)  & 1,
        rcode =>  $flags        & 0x0f,
        qd    => $qd,
        an    => $an,
        ns    => $ns,
        ar    => $ar,
    );
    return \%hdr;
}

n6 pulls the six shorts out at one go; the individual flag bits come out of shifts and masks on $flags. This pattern is universal: use pack / unpack for the byte-level layout, plain Perl for bit-level decoding of the flag fields.

Parsing a name#

Names in the answer section use the same length-prefix encoding, plus a pointer compression mechanism we will skip. For a fresh name (no compression) the decoder mirrors the encoder:

sub decode_name {
    my ($buf, $offset) = @_;
    my @labels;
    while (1) {
        my $len = unpack "x$offset C", $buf;
        last if $len == 0;
        die "compression pointer" if $len >= 0xc0;
        my $label = unpack "x${\ ($offset + 1)} a$len", $buf;
        push @labels, $label;
        $offset += 1 + $len;
    }
    return (join(".", @labels), $offset + 1);
}

Three unpack calls, each with an x$offset prefix to skip to the right byte. The name alternates between one-byte lengths and variable-width labels, and the loop keeps reading until it hits a zero-length label. The x$offset a$len trick — compose a template from Perl values, call unpack — is the standard pattern when the next field’s width depends on the previous field’s value.

What to carry forward#

  • An RFC byte-layout table maps directly to a template string. Big-endian shorts → n, big-endian longs → N, bytes → C.

  • Length-prefixed fields use C/a* / n/a* / … — one expression, no off-by-one counting by hand.

  • Bit-level fields inside a byte get decoded in plain Perl after unpack has handed you the byte or short that contains them.

  • Variable-width data requires staged unpacking. Read a length, then build the template for the data from that length, then call unpack again.

The next chapter applies the same approach to a real file format — a GIF image header — and introduces the “magic number check” idiom.