[Pdns-users] PowerDNS authoritative server random timeouts

Klaus Darilion klaus.mailinglists at pernau.at
Thu Sep 26 05:23:23 UTC 2019


I think, first you should find out if there is a problem with PowerDNS 
or the network - or inbetween.

If this happens regularly, just use tcpdump to caputre all DNS traffic 
to a file (rotate files, keep only X files and choose X to not fill your 
complete hard disk).

Or even simpler - just capture with tcpdump the loopback traffic (your 
own check script) with -i lo.

Make sure you really see the requests to PDNS, but no answers. Of maybe 
there are answers, but much too late.

The problem may be that PDNS reads fromt he socket to sloow. Then the 
socket fills up and you have packet loss (tools like netstat can report 
this).

Also monitor the PDNS statistics. Ie read:
https://blog.powerdns.com/2014/12/11/powerdns-graphing-as-a-service/

Then watch the number of outstanding queries, maybe send them every second.

regards
Klaus

Am 17.09.2019 um 16:09 schrieb Netsons - Federico Chiacchiaretta:
> Hi,
> we have a PowerDNS cluster of authoritative servers running on 4 nodes:
> 
> OS: CentOS 7.6.1810 (fully updated)
> Version: pdns-4.1.13-1pdns.el7.x86_64
> Backend: mysql - MariaDB-server-10.1.41-1.el7.centos.x86_64
> 
> Backend is configured with 1 master and 3 slaves.
> 
> We perform recurring checks (every 30s) to check if DNS server is
> working, and these checks randomly time out.
> Check are performed both from:
> 
> * an external tool (Pingdom) with a timeout of 30s
> * a bash scripts on each node, which performs a dig on the public IP
> address of that node (default time out of 5 seconds).
> 
> When a timeout occurs, it occurs only on one check mechanism (pingdom
> or script), never on both simultaneously.
> 
> Output from our script is simply:
> 
> ";; connection timed out; no servers could be reached"
> 
> Logs from pdns.service reports a lot of these messages
> 
> set 17 06:00:00 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:14 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:29 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> set 17 06:00:34 dns4.netsons.net pdns_server[10277]: TCP Connection
> Thread died because of network error: Timeout reading data
> 
> but these messages do not match timeout on our checks (though I'd like
> to understand why they get logged).
> 
> Do you have any hint about what I can check to further troubleshoot the
> issue?
> 
> Thanks.
> 
> Best,
> 



More information about the Pdns-users mailing list