[Pdns-users] PowerDNS authoritative server random timeouts

Netsons - Federico Chiacchiaretta f.chiacchiaretta at netsons.com
Tue Sep 17 14:09:32 UTC 2019


Hi,
we have a PowerDNS cluster of authoritative servers running on 4 nodes:

OS: CentOS 7.6.1810 (fully updated)
Version: pdns-4.1.13-1pdns.el7.x86_64
Backend: mysql - MariaDB-server-10.1.41-1.el7.centos.x86_64

Backend is configured with 1 master and 3 slaves.

We perform recurring checks (every 30s) to check if DNS server is
working, and these checks randomly time out.
Check are performed both from:

* an external tool (Pingdom) with a timeout of 30s
* a bash scripts on each node, which performs a dig on the public IP
address of that node (default time out of 5 seconds).

When a timeout occurs, it occurs only on one check mechanism (pingdom
or script), never on both simultaneously.

Output from our script is simply:

";; connection timed out; no servers could be reached"

Logs from pdns.service reports a lot of these messages

set 17 06:00:00 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:14 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:29 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data
set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
Thread died because of network error: Timeout reading data

but these messages do not match timeout on our checks (though I'd like
to understand why they get logged).

Do you have any hint about what I can check to further troubleshoot the
issue?

Thanks.

Best,

-- 
Federico Chiacchiaretta
System Administrator
Netsons S.r.l. - https://www.netsons.com



More information about the Pdns-users mailing list