[Pdns-users] Re: [Pdns-dev] Re: Domains with binary (e.g. UTF-8)
labels
Dean Anderson
dean at av8.com
Wed Dec 20 01:20:41 CET 2006
On Sun, 17 Dec 2006, bert hubert wrote:
> On Sun, Dec 17, 2006 at 01:07:33PM +0000, Julian Mehnle wrote:
>
> > Please stop patronizing me. I know what UTF-8 is. If the database
> (...)
> > then why did you have me file it? (And have you actually read my ticket?
>
> Come back when you've learned to work with the open source community. I'd
> love to help you, but this is not going anywhere.
Yikes. Julian has a point. If you're __planning__ to blow him off, why
have him file a ticket? That isn't the "open source" community way of
doing things; That sounds like what certain closed-source commercial
companies tell their customer support staff when people report
troublesome product issues that just can't be fixed.
On the other hand, I think the whole UTF-8 issue is completely a
misunderstanding. While UTF-8 represents ascii in one byte with the
same bit representation as Ascii, it requires more bytes to represent
hebrew, latin, etc and these languages would make no sense in DNS and
can't be one-byte case-insensitive. Only the byte codes for Ascii can
be case-insensitive----that is, only the ascii subset of utf8 is can be
insensitive.
Futher, the parsing of dotted DNS names can't be translated to/from
ascii/wire format. For example, suppose you get a DNS record with an
UTF8 name in say, hebrew. When this name is translated by an ordinary
resolver/dns cache/etc from wire format to dotted format, the multibyte
characters are mis-interpreted as ascii. As soon as the ascii byte code
for '.' (hex 2E) is found in the bytes of a multibyte character, the
byte-by-byte translation of the wire representation sequence of labels
is no longer what was intended---because there is an extra dot in the
name. You also can't get a meaningful interpretation of the
byte-by-byte upper/lower testing of a multi-byte character. Obviously,
then, DNS can't support the full of UTF-8 character set, unless it gives
up the dotted ascii format frequently used by DNS caches. I don't think
anyone has ever advocated the full UTF-8 support in discussions on
namedroppers. If they did, I completely missed it, as I would have said
something.
While DNS is 8 bit clean, it can't support multibyte characters. It
might partly support EBCDIC because "."(hex 0x2E) is a control character
in EBCDIC, so you probably won't see it in an EBCDIC DNS label. DNS
can't support generally character sets that have a different code for
"." or, more specifically, sets that embed the bytecode for ascii "." in
DNS labels.
Since DNS is therefore limited to a single-byte-long subset of UTF-8, we
are just arguing over the name of that subset. This subset is exactly
the same as Ascii. It makes no difference whether you say the "one-byte
subset of UTF-8" or just simply "Ascii". So, the dispute is really a
tempest in a teapot over labels for the same thing.
Supposing Julian means support the "one-byte-subset of UTF-8", the
answer is "we already do". [there is no database issue with the
one-byte subset of UTF-8].
But if Julian wants multibyte UTF-8 characters in DNS, it won't work; it
won't ever work. Database representations are only part (though
important) of the issue.
Is that clear?
--Dean
--
Av8 Internet Prepared to pay a premium for better service?
www.av8.net faster, more reliable, better service
617 344 9000
More information about the Pdns-dev
mailing list