Network protocols — a DNS query#
By the end of this chapter you will be able to build and parse
a DNS query header and question section using pack and unpack —
and take the same approach to any RFC-defined protocol.
A “network protocol” is, in the end, a sequence of byte-exact fields described in English prose. If you can read the field table, you can write the template. This chapter walks one concrete example — the DNS query packet — from the RFC wire diagram to a fully working pair of encode / decode subroutines.
The problem#
We want to ask a DNS server for the A record of example.com. The
UDP payload we send is a DNS message (RFC 1035), consisting of:
A 12-byte header.
A question section — the domain we are asking about, and the record type we want.
(A real client also parses the answer section in the response. We stop at encoding and showing a sketch of the decode.)
The header#
From RFC 1035 section 4.1.1, the header is six consecutive 16-bit big-endian fields:
Offset |
Field |
Meaning |
|---|---|---|
0 |
ID |
Arbitrary 16-bit identifier we choose |
2 |
Flags |
Opcode, RD bit, response flags, rcode |
4 |
QDCOUNT |
Number of entries in question section |
6 |
ANCOUNT |
Number of resource records in answer |
8 |
NSCOUNT |
Authority records |
10 |
ARCOUNT |
Additional records |
Six unsigned big-endian shorts means six n directives:
sub dns_header {
my (%opts) = @_;
pack "n6",
$opts{id} // 0,
$opts{flags} // 0,
$opts{qd} // 0,
$opts{an} // 0,
$opts{ns} // 0,
$opts{ar} // 0;
}
That is 12 bytes, exactly as the spec requires. n6 is shorthand
for n n n n n n.
The question section#
A question is:
A name, encoded as a sequence of length-prefixed labels terminated by a zero-length label.
A 16-bit QTYPE (1 = A record).
A 16-bit QCLASS (1 = IN, Internet).
For example.com the name encodes as:
\x07 e x a m p l e \x03 c o m \x00
Each label begins with a length byte; the whole name ends with a
zero-length label. Two things to notice: the length is one byte
(C), not two; and there is a trailing NUL but it is not
exactly what Z produces — it terminates the list of labels, not
a single string.
We can build this by hand from a domain name:
sub encode_name {
my ($name) = @_;
my $out = "";
for my $label (split /\./, $name) {
die "label too long" if length($label) > 63;
$out .= pack "C/a*", $label;
}
$out .= "\0"; # zero-length terminator
return $out;
}
Two directives in the loop body: C/a* — a single-byte length
followed by the label bytes, computed automatically (see the
grouping-and-counts chapter for the /
form). After the loop, a literal "\0" terminates the list.
With encode_name in place, the whole question section is two
concatenations and one pack:
sub dns_question {
my ($name, $qtype, $qclass) = @_;
return encode_name($name) . pack "n n", $qtype, $qclass;
}
Assembling the packet#
use constant {
QR_QUERY => 0,
OPCODE_QUERY=> 0,
RD => 1 << 8, # recursion desired
TYPE_A => 1,
CLASS_IN => 1,
};
sub dns_query_for_A {
my ($name) = @_;
my $id = int rand 65536;
my $flags = RD; # standard query, recursion desired
my $pkt = dns_header(
id => $id,
flags => $flags,
qd => 1, # one question
);
$pkt .= dns_question($name, TYPE_A, CLASS_IN);
return ($id, $pkt);
}
my ($id, $pkt) = dns_query_for_A("example.com");
# send $pkt over a UDP socket to port 53
29 bytes total: 12 header + 13 for the name
(\x07example\x03com\x00 is 13) + 2 + 2 for QTYPE and QCLASS. Run
length $pkt to confirm.
Parsing the response header#
The response has the same header shape — only the flag bits change. Decoding is the reverse template:
sub parse_dns_header {
my ($buf) = @_;
my ($id, $flags, $qd, $an, $ns, $ar) = unpack "n6", $buf;
my %hdr = (
id => $id,
qr => ($flags >> 15) & 1,
op => ($flags >> 11) & 0x0f,
aa => ($flags >> 10) & 1,
tc => ($flags >> 9) & 1,
rd => ($flags >> 8) & 1,
ra => ($flags >> 7) & 1,
rcode => $flags & 0x0f,
qd => $qd,
an => $an,
ns => $ns,
ar => $ar,
);
return \%hdr;
}
n6 pulls the six shorts out at one go; the individual flag bits
come out of shifts and masks on $flags. This pattern is universal:
use pack / unpack for the byte-level layout, plain Perl for
bit-level decoding of the flag fields.
Parsing a name#
Names in the answer section use the same length-prefix encoding, plus a pointer compression mechanism we will skip. For a fresh name (no compression) the decoder mirrors the encoder:
sub decode_name {
my ($buf, $offset) = @_;
my @labels;
while (1) {
my $len = unpack "x$offset C", $buf;
last if $len == 0;
die "compression pointer" if $len >= 0xc0;
my $label = unpack "x${\ ($offset + 1)} a$len", $buf;
push @labels, $label;
$offset += 1 + $len;
}
return (join(".", @labels), $offset + 1);
}
Three unpack calls, each with an x$offset prefix to skip to the
right byte. The name alternates between one-byte lengths and
variable-width labels, and the loop keeps reading until it hits a
zero-length label. The x$offset a$len trick — compose a template
from Perl values, call unpack — is the standard pattern when the
next field’s width depends on the previous field’s value.
What to carry forward#
An RFC byte-layout table maps directly to a template string. Big-endian shorts →
n, big-endian longs →N, bytes →C.Length-prefixed fields use
C/a*/n/a*/ … — one expression, no off-by-one counting by hand.Bit-level fields inside a byte get decoded in plain Perl after
unpackhas handed you the byte or short that contains them.Variable-width data requires staged unpacking. Read a length, then build the template for the data from that length, then call
unpackagain.
The next chapter applies the same approach to a real file format — a GIF image header — and introduces the “magic number check” idiom.