[Pdns-users] Re: Domains with binary (e.g. UTF-8) labels
Stephane Bortzmeyer
bortzmeyer at nic.fr
Sat Dec 16 21:37:05 UTC 2006
On Sat, Dec 16, 2006 at 10:17:23PM +0100,
bert hubert <bert.hubert at netherlabs.nl> wrote
a message of 29 lines which said:
> To encode utf-8 domains so that they work, use 'IDN'.
IDN is mandatory for host names but should not be for domain names
without hosts.
> Read for example paragraph 3.5 of RFC 1035, which contains: "The
> labels must follow the rules for ARPANET host names."
It is 2.3.1 and it says so only as a *preference* and indicated as
such. RFC 2181 makes very clear that the DNS is 8-bits clean:
The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators). The zero
length full name is defined as representing the root of the DNS tree,
and is typically written and displayed as ".". Those restrictions
aside, any binary string whatever can be used as the label of any
resource record. Similarly, any binary string can serve as the value
of any record that includes a domain name as some or all of its value
(SOA, NS, MX, PTR, CNAME, and any others that may be added).
Implementations of the DNS protocols must not place any restrictions
on the labels that can be used. In particular, DNS servers must not
refuse to serve a zone because it contains labels that might not be
acceptable to some DNS client programs. A DNS server may be
configurable to issue warnings when loading, or even to refuse to
load, a primary zone containing labels that might be considered
questionable, however this should not happen by default.
IMHO, PowerDNS is deeply wrong here.
> Even if we would support arbitrary values, things are unlikely to work as
> intended. IDN was invented for a reason.
Not this one. BIND or NSD work fine with 8-bits labels. IDN was
invented for two reasons:
* most domain names contain host names and host names indeed do have
the restriction (RFC 1123). That's also the reason why all the domain
registries I know of prevent non-LDH labels registration (LDH =
letters/digits/hyphen).
* the most important problem with Unicode in domain names is not the
fact that 8-bits label work or not (they work with BIND or NSD). It is
the *canonicalization*. ASCII labels have only one canonicalization
rule and a very simple one ("case does not matter"). For Unicode,
things are more complicated, you need a much more complicated
algorithm for canonicalization and the IETF thought it should be only
in the applications, not in the DNS servers.
More information about the Pdns-users
mailing list