[Pdns-users] Slow query and SERVERFAIL from local pdns_recursor

Otto Moerbeek otto at drijf.net
Fri Sep 11 07:22:04 UTC 2020


On Thu, Sep 10, 2020 at 03:40:54PM +0200, Christian Degenkolb via Pdns-users wrote:

> Hi Thomas,
> 
> what is a reasonable low value for udp-truncation-threshold? I tried with
> 900 and 600 (as low as half the default value) but found no improvements.

Try edns-outgoing-bufsize, that is the one that influences traffic
between the recursor and the authoritative servers.

> 
> Also I don't think this is a vmware.com problem since I have the same
> problem with multiple domains.

Yes, there clear are indications your connectivity is hampered somehwere.

	-Otto

> 
> To illustrate I found the tool dnsperf from
> https://www.dns-oarc.net/tools/dnsperf and created a queryfile with the list
> of 500 domains from here https://moz.com/top500 see
> https://paste.ubuntu.com/p/DxGBqRvngv/
> 
> If I call the tool against my local resolver on a clean cache (even with
> udp-truncation-threshol=600) I get the following output.
> 
> # rec_control wipe-cache $
> wiped 4154 records, 8 negative records, 500 packets
> # ./dnsperf -d queryfile_top500_clean
> DNS Performance Testing Tool
> Version 2.3.4
> 
> [Status] Command line: dnsperf -d queryfile_top500_clean
> [Status] Sending queries (to 127.0.0.1)
> [Status] Started at: Thu Sep 10 15:29:26 2020
> [Status] Stopping after 1 run through file
> 
> <snip multiple lines like "[Timeout] Query timed out: msg id 0" and
> "Warning: received a response with an unexpected (maybe timed out) id: 162">
> 
> [Status] Testing complete (end of file)
> 
> Statistics:
> 
>   Queries sent:         500
>   Queries completed:    278 (55.60%)
>   Queries lost:         222 (44.40%)
> 
>   Response codes:       NOERROR 209 (75.18%), SERVFAIL 69 (24.82%)
>   Average packet size:  request 29, response 56
>   Run time (s):         16.455935
>   Queries per second:   16.893601
> 
>   Average Latency (s):  1.313376 (min 0.000543, max 4.491949)
>   Latency StdDev (s):   1.446709
> 
> # ./dnsperf -d queryfile_top500_clean
> DNS Performance Testing Tool
> Version 2.3.4
> 
> [Status] Command line: dnsperf -d queryfile_top500_clean
> [Status] Sending queries (to 127.0.0.1)
> [Status] Started at: Thu Sep 10 15:29:49 2020
> [Status] Stopping after 1 run through file
> [Status] Testing complete (end of file)
> 
> Statistics:
> 
>   Queries sent:         500
>   Queries completed:    500 (100.00%)
>   Queries lost:         0 (0.00%)
> 
>   Response codes:       NOERROR 281 (56.20%), SERVFAIL 219 (43.80%)
>   Average packet size:  request 29, response 50
>   Run time (s):         4.571526
>   Queries per second:   109.372669
> 
>   Average Latency (s):  0.015253 (min 0.000054, max 4.556146)
>   Latency StdDev (s):   0.244755
> 
> As I see this way to much queries lost without a filled cache and way to
> high SERVFAIL for this kind of domains even on retries.
> The  SERVFAIL  stays high on subsequent runs.
> 
> Whereas if I run it against 1.1.1.1 (or the hoster DNS server) I get the
> following output.
> 
> # ./dnsperf -d queryfile_top500_clean -s 1.1.1.1
> DNS Performance Testing Tool
> Version 2.3.4
> 
> [Status] Command line: dnsperf -d queryfile_top500_clean -s 1.1.1.1
> [Status] Sending queries (to 1.1.1.1)
> [Status] Started at: Thu Sep 10 15:33:24 2020
> [Status] Stopping after 1 run through file
> [Status] Testing complete (end of file)
> 
> Statistics:
> 
>   Queries sent:         500
>   Queries completed:    500 (100.00%)
>   Queries lost:         0 (0.00%)
> 
>   Response codes:       NOERROR 499 (99.80%), SERVFAIL 1 (0.20%)
>   Average packet size:  request 29, response 77
>   Run time (s):         0.882704
>   Queries per second:   566.441299
> 
>   Average Latency (s):  0.013521 (min 0.005065, max 0.863349)
>   Latency StdDev (s):   0.054510
> 
> A near perfect score.
> 
> Doesn't this mean the problem lies within the local resolver since dnsperf
> would make the same requests the local resolver would make to the external
> DNS server?
> Or at least there does not exist an uplink problem but something local to my
> server?
> 
> regards
> Chris
> 
> 
> 
> 
> 
> Am 2020-09-09 10:05, schrieb Thomas Mieslinger via Pdns-users:
> > Hi Christian,
> > 
> > Hetzner might filter ip fragments. Please try if your situation gets
> > better if you set udp-truncation-threshold to a reasonable low value.
> > 
> > By default pdns-recursor does dnssec. I would like to suggest to set
> > +dnssec on your dig queries.
> > 
> > A possible workaround for the vmware.com problems is to add a negative
> > trust anchor for vmware.com. in pdns config.
> > 
> > Cheers Thomas
> > 
> > On 9/8/20 2:16 PM, Christian Degenkolb via Pdns-users wrote:
> > > Hi,
> > > 
> > > I set the trace=yes option in the recursor config an redid the tests
> > > for
> > > pubs.vmware.com.
> > > 
> > > The log can be found here https://paste.debian.net/hidden/07526601/
> > > 
> > > I found two timeouts in the logs
> > > 
> > > Line 41:
> > > Sep  8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com:
> > > Resolved
> > > 'vmware.com' NS ns01.vmwdns.com to: 45.54.11.1
> > > Sep  8 10:21:54 rho pdns_recursor[25208]: [3] pubs.vmware.com:
> > > Trying IP
> > > 45.54.11.1:53, asking 'pubs.vmware.com|A'
> > > Sep  8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com: timeout
> > > resolving after 1501.63msec
> > > Sep  8 10:21:56 rho pdns_recursor[25208]: [3] pubs.vmware.com:
> > > Trying to
> > > resolve NS 'ns04.vmwdns.com' (2/8)
> > > 
> > > But a request to the 45.54.11.1 for pubs.vmware.com come back within
> > > 11
> > > msec.
> > > 
> > > $ dig -t A @45.54.11.1 pubs.vmware.com
> > > 
> > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @45.54.11.1
> > > pubs.vmware.com
> > > ; (1 server found)
> > > ;; global options: +cmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24122
> > > ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
> > > ;; WARNING: recursion requested but not available
> > > 
> > > ;; OPT PSEUDOSECTION:
> > > ; EDNS: version: 0, flags:; udp: 4096
> > > ;; QUESTION SECTION:
> > > ;pubs.vmware.com.INA
> > > 
> > > ;; ANSWER SECTION:
> > > pubs.vmware.com.30INCNAME   pubs.vmware.com.ds.edgekey.net.
> > > 
> > > ;; Query time: 11 msec
> > > ;; SERVER: 45.54.11.1#53(45.54.11.1)
> > > ;; WHEN: Tue Sep 08 13:29:57 CEST 2020
> > > ;; MSG SIZE  rcvd: 88
> > > 
> > > and a seconds timeout in line 159:
> > > 
> > > Sep  8 10:21:56 rho pdns_recursor[25208]: [3]
> > > e751.dscx.akamaiedge.net:
> > > Trying IP 2.16.106.23:53, asking 'e751.dscx.akamaiedge.net|A'
> > > Sep  8 10:21:57 rho pdns_recursor[25208]: [3]
> > > e751.dscx.akamaiedge.net:
> > > timeout resolving after 1501.74msec
> > > Sep  8 10:21:57 rho pdns_recursor[25208]: [3]
> > > e751.dscx.akamaiedge.net:
> > > Trying to resolve NS 'n3dscx.akamaiedge.net' (2/8)
> > > 
> > > Same picture here with a very good response time.
> > > 
> > > $ dig -t A @2.16.106.23 e751.dscx.akamaiedge.net
> > > 
> > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @2.16.106.23
> > > e751.dscx.akamaiedge.net
> > > ; (1 server found)
> > > ;; global options: +cmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7947
> > > ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
> > > ;; WARNING: recursion requested but not available
> > > 
> > > ;; OPT PSEUDOSECTION:
> > > ; EDNS: version: 0, flags:; udp: 4096
> > > ;; QUESTION SECTION:
> > > ;e751.dscx.akamaiedge.net.INA
> > > 
> > > ;; ANSWER SECTION:
> > > e751.dscx.akamaiedge.net. 20INA104.111.214.47
> > > 
> > > ;; Query time: 5 msec
> > > ;; SERVER: 2.16.106.23#53(2.16.106.23)
> > > ;; WHEN: Tue Sep 08 13:31:32 CEST 2020
> > > ;; MSG SIZE  rcvd: 69
> > > 
> > > 
> > > To check that this is not a vmware.com problem I tested some more and
> > > got the same timeouts.
> > > 
> > > 
> > > One more example for
> > > 
> > > $dig nameservers.dnscheck.co @127.0.0.1
> > > 
> > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> nameservers.dnscheck.co
> > > @127.0.0.1
> > > ;; global options: +cmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 23852
> > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> > > 
> > > ;; OPT PSEUDOSECTION:
> > > ; EDNS: version: 0, flags:; udp: 4096
> > > ;; QUESTION SECTION:
> > > ;nameservers.dnscheck.co.INA
> > > 
> > > ;; Query time: 3005 msec
> > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > ;; WHEN: Tue Sep 08 12:15:29 CEST 2020
> > > ;; MSG SIZE  rcvd: 52
> > > 
> > > can be found here https://paste.debian.net/hidden/b48a78a2/.
> > > 
> > > This time multiple timeout regarding the root name servers, for
> > > example
> > > g.root-servers.net
> > > 
> > > Sep  8 12:15:21 rho pdns_recursor[25208]: [50]
> > > nameservers.dnscheck.co:
> > > Resolved '.' NS g.root-servers.net to: 192.112.36.4
> > > Sep  8 12:15:21 rho pdns_recursor[25208]: [50]
> > > nameservers.dnscheck.co:
> > > Trying IP 192.112.36.4:53, asking 'nameservers.dnscheck.co|A'
> > > Sep  8 12:15:22 rho pdns_recursor[25208]: [50]
> > > nameservers.dnscheck.co:
> > > timeout resolving after 1501.63msec
> > > Sep  8 12:15:22 rho pdns_recursor[25208]: [50]
> > > nameservers.dnscheck.co:
> > > Trying to resolve NS 'j.root-servers.net' (2/13)
> > > 
> > > Where a direct request via dig works like a charm.
> > > 
> > > $ dig -t A @192.112.36.4 nameservers.dnscheck.co
> > > 
> > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> -t A @192.112.36.4
> > > nameservers.dnscheck.co
> > > ; (1 server found)
> > > ;; global options: +cmd
> > > ;; Got answer:
> > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18641
> > > ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 13
> > > ;; WARNING: recursion requested but not available
> > > 
> > > ;; OPT PSEUDOSECTION:
> > > ; EDNS: version: 0, flags:; udp: 4096
> > > ; COOKIE: ce9eaf15bb34977b41354b5f5f576c3841785bfba5901e93 (good)
> > > ;; QUESTION SECTION:
> > > ;nameservers.dnscheck.co.INA
> > > 
> > > ;; AUTHORITY SECTION:
> > > co.172800  INNSns5.cctld.co.
> > > co.172800  INNSns1.cctld.co.
> > > co.172800  INNSns6.cctld.co.
> > > co.172800  INNSns4.cctld.co.
> > > co.172800  INNSns3.cctld.co.
> > > co.172800  INNSns2.cctld.co.
> > > 
> > > ;; ADDITIONAL SECTION:
> > > ns1.cctld.co.   172800  INA156.154.100.25
> > > ns2.cctld.co.   172800  INA156.154.101.25
> > > ns3.cctld.co.   172800  INA156.154.102.25
> > > ns4.cctld.co.   172800  INA156.154.103.25
> > > ns5.cctld.co.   172800  INA156.154.104.25
> > > ns6.cctld.co.   172800  INA156.154.105.25
> > > ns1.cctld.co.   172800  INAAAA2001:502:2eda::21
> > > ns2.cctld.co.   172800  INAAAA2001:502:ad09::21
> > > ns3.cctld.co.   172800  INAAAA2610:a1:1009::21
> > > ns4.cctld.co.   172800  INAAAA2610:a1:1010::21
> > > ns5.cctld.co.   172800  INAAAA2610:a1:1011::21
> > > ns6.cctld.co.   172800  INAAAA2610:a1:1012::21
> > > 
> > > ;; Query time: 16 msec
> > > ;; SERVER: 192.112.36.4#53(192.112.36.4)
> > > ;; WHEN: Tue Sep 08 13:34:20 CEST 2020
> > > ;; MSG SIZE  rcvd: 458
> > > 
> > > 
> > > Additionally I get the resolved IPs in the trace logs (line 328
> > > apparently from the seconds worker thread) but not the dig output.
> > > 
> > > Sep  8 12:15:33 rho pdns_recursor[25208]: [51]
> > > nameservers.dnscheck.co:
> > > answer is in: resolved to '52.48.61.155|A'
> > > Sep  8 12:15:33 rho pdns_recursor[25208]: [51]
> > > nameservers.dnscheck.co:
> > > answer is in: resolved to '104.236.169.228|A'
> > > Sep  8 12:15:33 rho pdns_recursor[25208]: [51]
> > > nameservers.dnscheck.co:
> > > answer is in: resolved to '104.131.72.189|A'
> > > 
> > > Is this a dig timeout? Or do I only get the response from the first
> > > worker thread?
> > > 
> > > And now I'm more confused then before. The connection from and to the
> > > server (SSH, etc) is rock solid.
> > > A iperf test shows the full gigabit connection is available.
> > > The server is more or less idle and has 8 cores and 32GB RAM as
> > > mostly a
> > > docker host with some 20-30 container (nextcloud, mailcow, ...)
> > > running
> > > for personal usage by me and my family.
> > > 
> > > How can I check for problems with a large number of small connections?
> > > But this shouldn't be that much fur a single local recursor, should
> > > it?
> > > 
> > > Also I don't see any network related messages in the kernel log or
> > > anywhere else.
> > > I'm not aware of any rate limits for the uplink to the provider.
> > > 
> > > regards
> > > Chris
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Am 2020-09-08 09:33, schrieb Otto Moerbeek:
> > > > On Tue, Sep 08, 2020 at 09:22:31AM +0200, Christian Degenkolb wrote:
> > > > 
> > > > > (send again, first answer was not send cc to the ML)
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > sorry for not sending any configs. pdns_recursor runs more or less
> > > > > with the
> > > > > vanilla config with the following changes:
> > > > > 
> > > > > forward-zones-recurse=zen.spamhaus.org=1.1.1.1;1.0.0.1 (thats why I
> > > > > wanted
> > > > > to use the local recursor, as mentioned the server is located in the
> > > > > hetzner
> > > > > IP Range which apparently is blocked for the spamhaus DNSBL)
> > > > > loglevel=6
> > > > > log-common-errors=yes
> > > > > quiet=no
> > > > > root-nx-trust=no (found this as a solution for the
> > > > > SERVERFAIL but did
> > > > > not
> > > > > work)
> > > > > 
> > > > > and
> > > > > # rec_control set-carbon-server 37.252.122.50 rho-test (for
> > > > > the grafs)
> > > > > 
> > > > > 
> > > > > A trace for the same resolves from my last mail:
> > > > > 
> > > > >  $ time dig +trace pubs.vmware.com @127.0.0.1
> > > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> +trace pubs.vmware.com
> > > > > @127.0.0.1
> > > > > ;; global options: +cmd
> > > > > .                       86118   IN      NS      d.root-servers.net.
> > > > > .                       86118   IN      NS      c.root-servers.net.
> > > > > .                       86118   IN      NS      l.root-servers.net.
> > > > > .                       86118   IN      NS      b.root-servers.net.
> > > > > .                       86118   IN      NS      f.root-servers.net.
> > > > > .                       86118   IN      NS      m.root-servers.net.
> > > > > .                       86118   IN      NS      e.root-servers.net.
> > > > > .                       86118   IN      NS      a.root-servers.net.
> > > > > .                       86118   IN      NS      i.root-servers.net.
> > > > > .                       86118   IN      NS      k.root-servers.net.
> > > > > .                       86118   IN      NS      g.root-servers.net.
> > > > > .                       86118   IN      NS      h.root-servers.net.
> > > > > .                       86118   IN      NS      j.root-servers.net.
> > > > > .                       86118   IN      RRSIG   NS 8 0 518400
> > > > > 20200921050000
> > > > > 20200908040000 46594 .
> > > > > wgnBz8tKA9hjwIxmMQgTVwnZaiUpAB9a1+oC5T/syHzqNj1e5qhApLQN
> > > > > NLok43hu5Ykt8RFe/IiDZuYxIdyyzItwk
> > > > > 4QN8xNgsQsfhVfBbZ26bWRz
> > > > > fskquwnFn6Gmvq2qI6o42tsBxXUw09X4sNlNYI2zHB3sKaaMu0AbN9WI
> > > > > Pe14jpX/PwaP3m78+XqMy9CiKmuDon6g3BuyecPhCZL5Pa8ZPC7nrKfV
> > > > > pfyNSiPoBODsJE96UHGlOCJTFcbu/6Ia4ek3AGOJf+WC84HPrxLT
> > > > > riyk
> > > > > XHfbPl7EjTbFSPgT8D7jGBfVCTQU3JSfynv29VFAHWZu1gm5VJWNQGaw
> > > > > u5gatA==
> > > > > ;; Received 540 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
> > > > > 
> > > > > com.                    172800  IN      NS      a.gtld-servers.net.
> > > > > com.                    172800  IN      NS      b.gtld-servers.net.
> > > > > com.                    172800  IN      NS      c.gtld-servers.net.
> > > > > com.                    172800  IN      NS      d.gtld-servers.net.
> > > > > com.                    172800  IN      NS      e.gtld-servers.net.
> > > > > com.                    172800  IN      NS      f.gtld-servers.net.
> > > > > com.                    172800  IN      NS      g.gtld-servers.net.
> > > > > com.                    172800  IN      NS      h.gtld-servers.net.
> > > > > com.                    172800  IN      NS      i.gtld-servers.net.
> > > > > com.                    172800  IN      NS      j.gtld-servers.net.
> > > > > com.                    172800  IN      NS      k.gtld-servers.net.
> > > > > com.                    172800  IN      NS      l.gtld-servers.net.
> > > > > com.                    172800  IN      NS      m.gtld-servers.net.
> > > > > com.                    86400   IN      DS      30909 8 2
> > > > > E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
> > > > > com.                    86400   IN      RRSIG   DS 8 1 86400
> > > > > 20200921050000
> > > > > 20200908040000 46594 .
> > > > > zz85z6R/YUHxyW+ywA6zrgiYILjPo0i248M3wU+2XCRCneBH6yknQfjM
> > > > > LIcbo3vADVUlkJd0l4W2TLd7NPgC255hr2
> > > > > +ALojzzHa07jyFmE203Kdw
> > > > > ma7XL0C55TdFrCEMhARkZf4EncfJH9JH+fdWRWdMr0EQZd1A+FzMYemO
> > > > > o7/L/8ZYq4FOt0vz+zheAJNDveGii+QpXAoDyw4xt3HMUVM+40Z/VgD1
> > > > > tk9Y3K9e2wwRNISeHdlq21JFVA2SY/gDgPCzBtM1r9Yz7oFZ2ld5W
> > > > > AD0 P84GPEUMgUceAGofwxlV9+dSawhunskb+yVrpdjpizLageyJRWEu/F9A
> > > > > zDXxew==
> > > > > ;; Received 1175 bytes from
> > > > > 198.97.190.53#53(h.root-servers.net) in 5 ms
> > > > > 
> > > > > vmware.com.             172800  IN      NS      dns1.p05.nsone.net.
> > > > > vmware.com.             172800  IN      NS      dns2.p05.nsone.net.
> > > > > vmware.com.             172800  IN      NS      dns3.p05.nsone.net.
> > > > > vmware.com.             172800  IN      NS      dns4.p05.nsone.net.
> > > > > vmware.com.             172800  IN      NS      ns01.vmwdns.com.
> > > > > vmware.com.             172800  IN      NS      ns02.vmwdns.com.
> > > > > vmware.com.             172800  IN      NS      ns03.vmwdns.com.
> > > > > vmware.com.             172800  IN      NS      ns04.vmwdns.com.
> > > > > vmware.com.             86400   IN      DS      48553 13 2
> > > > > AA2C697F3990472642AF01509A18224828E403CA8608EC75D5C83002 CE21847E
> > > > > vmware.com.             86400   IN      RRSIG   DS 8 2 86400
> > > > > 20200915062203
> > > > > 20200908051203 24966 com.
> > > > > FA2xsJKvT2LLn5UEy7hAE7PaYmds7FBkQB0SGhm8riwJRKnxbHAY0tvv
> > > > > I1T/k0EzXJ4wy1J5qzNLMjhzFgPxEQB
> > > > > 6BwBfJm8qo8Cnzxm4YC5Ko1/9
> > > > > pDWooVBHoFfMmJgu14Dk+u1AcHobxH9pPs7az16cLK/3YeaFW3dCrIVQ
> > > > > NK2fZc0d/pc7CY0Zl1LjYQdTq+MsZiL2kbepEHD6A/4J6g==
> > > > > ;; Received 523 bytes from 2001:503:eea3::30#53(g.gtld-servers.net)
> > > > > in 6 ms
> > > > > 
> > > > > pubs.vmware.com.        30      IN      CNAME
> > > > > pubs.vmware.com.ds.edgekey.net.
> > > > > pubs.vmware.com.        30      IN      RRSIG   CNAME 13 3 30
> > > > > 20200909071011
> > > > > 20200907071011 12752 vmware.com.
> > > > > yTxj4OFvCx3flxtOFAFdkwAOpOAVNibgseFi5U5ekzYbdATw98xZqrDT
> > > > > tYs/n46iHFiLN4ql4Y3MS6U
> > > > > 16Qr6DQ==
> > > > > ;; Received 194 bytes from 45.54.11.1#53(ns01.vmwdns.com) in 11 ms
> > > > > 
> > > > > real0m32.149s
> > > > > user0m0.012s
> > > > > sys0m0.012s
> > > > > 
> > > > > But this looks normal to me. I don't know why the trace only
> > > > > shows 5,
> > > > > 6 and
> > > > > 11 ms but takes up to 32 seconds to finish.
> > > > 
> > > > Well, that is suspect, but see below.
> > > > 
> > > > > 
> > > > > Regarding your questions for the ipv6 connectivity. How can
> > > > > I test this?
> > > > 
> > > > Run pdns_recursor with the --trace option (or trace=yes in the config
> > > > file), do some queries and look at the results in the log file. Now
> > > > the recursor logs a lot in trace mode, so take your time trying to
> > > > understand what is going on. Members of this list can likely help if
> > > > you do not spot anything.
> > > > 
> > > >     -Otto
> > > > 
> > > > > 
> > > > > I did a
> > > > > 
> > > > > $ dig ipv6.google.com @127.0.0.1
> > > > > 
> > > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> ipv6.google.com
> > > > > @127.0.0.1
> > > > > ;; global options: +cmd
> > > > > ;; Got answer:
> > > > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9226
> > > > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
> > > > > 
> > > > > ;; OPT PSEUDOSECTION:
> > > > > ; EDNS: version: 0, flags:; udp: 4096
> > > > > ;; QUESTION SECTION:
> > > > > ;ipv6.google.com.INA
> > > > > 
> > > > > ;; ANSWER SECTION:
> > > > > ipv6.google.com.86400   INCNAME   ipv6.l.google.com.
> > > > > 
> > > > > ;; AUTHORITY SECTION:
> > > > > l.google.com.   60INSOAns1.google.com. dns-admin.google.com.
> > > > > 330353109 900
> > > > > 900 1800 60
> > > > > 
> > > > > ;; Query time: 3087 msec
> > > > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > > > ;; WHEN: Tue Sep 08 09:12:50 CEST 2020
> > > > > ;; MSG SIZE  rcvd: 115
> > > > > 
> > > > > and
> > > > > 
> > > > > $ ping6 ipv6.google.com
> > > > > PING ipv6.google.com(fra16s13-in-x0e.1e100.net
> > > > > (2a00:1450:4001:819::200e))
> > > > > 56 data bytes
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=1 ttl=118 time=5.11 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=2 ttl=118 time=5.08 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=3 ttl=118 time=5.12 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=4 ttl=118 time=5.13 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=5 ttl=118 time=5.09 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=6 ttl=118 time=5.08 ms
> > > > > 64 bytes from fra16s13-in-x0e.1e100.net (2a00:1450:4001:819::200e):
> > > > > icmp_seq=7 ttl=118 time=5.08 ms
> > > > > ^C
> > > > > --- ipv6.google.com ping statistics ---
> > > > > 7 packets transmitted, 7 received, 0% packet loss, time 24ms
> > > > > rtt min/avg/max/mdev = 5.075/5.096/5.133/0.043 ms
> > > > > 
> > > > > and it looks good.
> > > > > 
> > > > > regards
> > > > > Chris
> > > > > 
> > > > > 
> > > > > Am 2020-09-04 15:05, schrieb Otto Moerbeek:
> > > > > > On Wed, Sep 02, 2020 at 09:44:37AM +0200, Christian Degenkolb via
> > > > > > Pdns-users wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I hope somebody on the ML can help me figure out what I'm doing
> > > > > wrong.
> > > > > > > I have a local pdns_recursor (version 4.1.11-1+deb10u1 from
> > > > > debian 10)
> > > > > > > runing and added it at the top of my /etc/resolve.conf as 127.0.0.1.
> > > > > > >
> > > > > > > However I see some strange SERVERFAIL resolves happening and all in
> > > > > > > all a
> > > > > > > slow DNS system.
> > > > > > >
> > > > > > > For example see the following two consecutive resolves and a direct
> > > > > > > request
> > > > > > > to the NS.
> > > > > > > The first one takes nearly 3 seconds vs 11 ms from the same system
> > > > > > > if I
> > > > > > > query the NS directly.
> > > > > > >
> > > > > > > $ dig pubs.vmware.com @127.0.0.1
> > > > > > >
> > > > > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com
> > > > > > > @127.0.0.1
> > > > > > > ;; global options: +cmd
> > > > > > > ;; Got answer:
> > > > > > > ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 4929
> > > > > > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
> > > > > > >
> > > > > > > ;; OPT PSEUDOSECTION:
> > > > > > > ; EDNS: version: 0, flags:; udp: 4096
> > > > > > > ;; QUESTION SECTION:
> > > > > > > ;pubs.vmware.com.INA
> > > > > > >
> > > > > > > ;; ANSWER SECTION:
> > > > > > > pubs.vmware.com.30INCNAME   pubs.vmware.com.ds.edgekey.net.
> > > > > > > pubs.vmware.com.ds.edgekey.net. 10 IN   CNAME
> > > > > > > e751.dscx.akamaiedge.net.
> > > > > > >
> > > > > > > ;; Query time: 3009 msec
> > > > > > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > > > > > ;; WHEN: Wed Sep 02 09:19:04 CEST 2020
> > > > > > > ;; MSG SIZE  rcvd: 123
> > > > > > >
> > > > > > > $ dig pubs.vmware.com @127.0.0.1
> > > > > > >
> > > > > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com
> > > > > > > @127.0.0.1
> > > > > > > ;; global options: +cmd
> > > > > > > ;; Got answer:
> > > > > > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1345
> > > > > > > ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
> > > > > > >
> > > > > > > ;; OPT PSEUDOSECTION:
> > > > > > > ; EDNS: version: 0, flags:; udp: 4096
> > > > > > > ;; QUESTION SECTION:
> > > > > > > ;pubs.vmware.com.INA
> > > > > > >
> > > > > > > ;; ANSWER SECTION:
> > > > > > > pubs.vmware.com.18INCNAME   pubs.vmware.com.ds.edgekey.net.
> > > > > > > pubs.vmware.com.ds.edgekey.net. 4 INCNAME
> > > > > e751.dscx.akamaiedge.net.
> > > > > > > e751.dscx.akamaiedge.net. 16INA104.111.214.47
> > > > > > >
> > > > > > > ;; Query time: 0 msec
> > > > > > > ;; SERVER: 127.0.0.1#53(127.0.0.1)
> > > > > > > ;; WHEN: Wed Sep 02 09:19:08 CEST 2020
> > > > > > > ;; MSG SIZE  rcvd: 139
> > > > > > >
> > > > > > > $ dig pubs.vmware.com @ns03.vmwdns.com
> > > > > > >
> > > > > > > ; <<>> DiG 9.11.5-P4-5.1+deb10u2-Debian <<>> pubs.vmware.com
> > > > > > > @ns03.vmwdns.com
> > > > > > > ;; global options: +cmd
> > > > > > > ;; Got answer:
> > > > > > > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5509
> > > > > > > ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
> > > > > > > ;; WARNING: recursion requested but not available
> > > > > > >
> > > > > > > ;; OPT PSEUDOSECTION:
> > > > > > > ; EDNS: version: 0, flags:; udp: 4096
> > > > > > > ;; QUESTION SECTION:
> > > > > > > ;pubs.vmware.com.INA
> > > > > > >
> > > > > > > ;; ANSWER SECTION:
> > > > > > > pubs.vmware.com.30INCNAME   pubs.vmware.com.ds.edgekey.net.
> > > > > > >
> > > > > > > ;; Query time: 11 msec
> > > > > > > ;; SERVER: 45.54.11.129#53(45.54.11.129)
> > > > > > > ;; WHEN: Wed Sep 02 09:34:42 CEST 2020
> > > > > > > ;; MSG SIZE  rcvd: 88
> > > > > > >
> > > > > > > Also I have a number SERVFAIL in /var/log/syslog (pdns_recurser is
> > > > > > > currently
> > > > > > > running with loglevel=6).
> > > > > > > For example:
> > > > > > >
> > > > > > > Sep  2 08:45:35 rho pdns_recursor[19311]: Sending SERVFAIL to
> > > > > > > 127.0.0.1
> > > > > > > during resolve of 'pubs.vmware.com' because: Too much time
> > > > > waiting for
> > > > > > > pubs.vmware.com.ds.edgekey.net|A, timeouts: 5,
> > > > > > > throttles: 1, queries: 6, 7991msec
> > > > > > >
> > > > > > > # grep 'Too much time waiting for' /var/log/syslog | wc -l
> > > > > > > 184
> > > > > > >
> > > > > > > As per
> > > > > > > https://blog.powerdns.com/2014/12/11/powerdns-graphing-as-a-service/
> > > > > > > I send the metrics to
> > > > > https://metronome1.powerdns.com/?server=pdns.rho-test.recursor&beginTime=-172800
> > > > > 
> > > > > > >
> > > > > > > Does anybody have an idea whats wrong? This seems way to slow for
> > > > > > > DNS and
> > > > > > > the SERVFAIL schouldn't happen this often.
> > > > > > > The server in question is running in a DC of the german Hoster
> > > > > > > hetzner.de.
> > > > > > > Besides the strange DNS I don't have any problems with the
> > > > > > > reliability of
> > > > > > > the network connection.
> > > > > > >
> > > > > > > thanks
> > > > > > > Chris
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Pdns-users mailing list
> > > > > > > Pdns-users at mailman.powerdns.com
> > > > > > > https://mailman.powerdns.com/mailman/listinfo/pdns-users
> > > > > >
> > > > > > You did not share any config or traces, so it's hard to tell. A wild
> > > > > > guess: It might be you enabled IPV6 but your IPV6 connectivity is bad.
> > > > > >
> > > > > >     -Otto
> > > _______________________________________________
> > > Pdns-users mailing list
> > > Pdns-users at mailman.powerdns.com
> > > https://mailman.powerdns.com/mailman/listinfo/pdns-users
> > _______________________________________________
> > Pdns-users mailing list
> > Pdns-users at mailman.powerdns.com
> > https://mailman.powerdns.com/mailman/listinfo/pdns-users
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users


More information about the Pdns-users mailing list