[Pdns-users] gmysql: Is latin1 really necessary? What are the consequences of using UTF-8?
Michael Loftis
mloftis at wgops.com
Fri Oct 30 14:16:59 UTC 2020
I was hoping someone who knew more about PDNS authoritative server
itself would chime in....
For MySQL server+client, if the character set in the libmysqlclient
and server side tables/etc match, it doesn't matter except for server
side sorts (collations). If it is latin1 all the way through then
it's actually treated as binary. This is to avoid the performance
penalty of character set conversions. DNS itself is more
difficult...labels are ASCII. PowerDNS internally for example has an
upper case conversion routine that blatantly assumes ASCII/latin1.
When it checks for spaces, it's checking for 0x20, not for any UTF-8
formation that might mean whitespace. Content/RDATA, type, etc, are
all binary and so what is in the content is defined by the content
type. In the cases where there is expected to be non ASCII data, it's
translated by PowerDNS itself, based on the relevant RFCs - which
often have their own way to represent non-ascii data such as for TXT
records. PowerDNS Auth itself I don't believe pays any mind to what
the underlying database clients do WRT character sets and fields, as
long as the data is as expected after it comes out of the backend. So
if it's backed by a TEXT field, it doesn't actually care, at least in
the MySQL case.
So...yeah you could, but...why? What problem are you trying to solve?
What advantage are you looking for?
On Fri, Oct 30, 2020 at 6:44 AM Nicholas Williams via Pdns-users
<pdns-users at mailman.powerdns.com> wrote:
>
> Nobody has any thoughts here?
>
> Thanks,
>
> Nick
>
> > On Oct 25, 2020, at 11:51 AM, Nicholas Williams <nicholas at nicholaswilliams.net> wrote:
> >
> > In the past 4-5 years, I’ve gotten into the habit of defaulting all MySQL tables to this:
> >
> > DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_520_ci
> >
> > Looking at the latest PowerDNS schema (I’m about to start up a second environment), I noticed that the entire schema has this:
> >
> > CHARACTER SET ‘latin1'
> >
> > I did some searching through the archives, but couldn’t readily find an answer about this: Is there a specific reason why LATIN-1 was chosen and must be used? What are the consequences of using UTF-8 instead of LATIN-1?
> >
> > One consequence that I know of is that `records.content` can’t be VARCHAR(64000) and also be UTF-8, so it must either be made explicitly LATIN-1, or it must be shortened to VARCHAR(16383), or it must be converted to a TEXT column. Are there are negative consequences of making it a TEXT column?
> >
> > Thanks,
> >
> > Nick
>
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users
--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler
More information about the Pdns-users
mailing list