[Pdns-users] Re: Domains with binary (e.g. UTF-8) labels
Julian Mehnle
julian at mehnle.net
Sun Dec 17 16:21:44 UTC 2006
Ian Tester wrote:
> Julian Mehnle <julian at mehnle.net> wrote:
> > Case insensitivity applies only to ASCII characters. So where's the
> > problem?
>
> We don't know if the data is ASCII. If what Bert stated is true, we
> don't know what the data should be. That's the problem.
But DNS doesn't distinguish between "binary" and "non-binary" (at least not
in the sense I mean it -- I am NOT talking about the very special "binary
labels" defined in RFC 2673, in case anyone thought I might be referring
to that). All domain names are binary, some (most) just contain 7-bit
bytes only.
> Passing around strings of text '8-bit clean' is fine until you have to
> actually understand what the stream of bytes mean i.e manipulate it or
> make decisions based on it. That's where character encodings come in.
> You can't simply say: This byte is 0x42, which is upper-case 'B', so
> I'll lower-case it to 'b'. That byte might be part of a double-byte
> character in UTF-16, or any other multi-byte encoding.
No, not "any" other multi-byte encoding. UTF-8, for example, clearly
separates ASCII characters from non-ASCII characters. UTF-8 byte
sequences consist of only bytes with the high-bit set.
You are correct about certain other encodings such as UTF-16, though. But
that shouldn't be the problem of PDNS. As I said, domain owners cannot
(and will not) assume that their "desired" wire encoding will be recog-
nized by anyone.
> We need to know what encoding a string uses, otherwise we risk making a
> real mess.
I don't think we risk making "a real mess". If you as the DNS admin know
what you're doing, there is no problem.
It's the same with BIND. If you configure a domain with a UTF-16 byte
sequence containing 7-bit bytes, you risk it getting confused with
different binary domains. That's life. But that doesn't mean that binary
domain names should not be supported by BIND or PDNS.
BTW, a friend of mine just pointed out RFC 4343[1], "DNS Case Insensitivity
Clarification", to me. It is highly relevant to this discussion and I
recommend reading it to anyone participating. Among other things, it
says:
| 3. Name Lookup, Label Types, and CLASS
|
| According to the original DNS design decision, comparisons on name
| lookup for DNS queries should be case insensitive [STD13]. That is
| to say, a lookup string octet with a value in the inclusive range
| from 0x41 to 0x5A, the uppercase ASCII letters, MUST match the
| identical value and also match the corresponding value in the
| inclusive range from 0x61 to 0x7A, the lowercase ASCII letters. A
| lookup string octet with a lowercase ASCII letter value MUST
| similarly match the identical value and also match the corresponding
| value in the uppercase ASCII letter range.
|
| [...]
|
| One way to implement this rule would be to subtract 0x20 from all
| octets in the inclusive range from 0x61 to 0x7A before comparing
| octets. Such an operation is commonly known as "case folding", but
| implementation via case folding is not required. Note that the DNS
| case insensitivity does NOT correspond to the case folding specified
| in [ISO-8859-1] or [ISO-8859-2]. For example, the octets 0xDD (\221)
| and 0xFD (\253) do NOT match, although in other contexts, where they
| are interpreted as the upper- and lower-case version of "Y" with an
| acute accent, they might.
(It has more interesting stuff to say.)
All that I'm suggesting in ticket 115 is a way for storing and retrieving
binary domains in/from a PDNS backend database. If anyone can come up
with an approach that is more general or otherwise better, I'm all ears.
References:
1. http://www.rfc-editor.org/rfc/rfc4343.txt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20061217/a5511acd/attachment-0001.sig>
More information about the Pdns-users
mailing list