[Pdns-users] PowerDNS Recursor server stopped resolving about half of all domains last night; I built a new server and it's doing the same thing

Nicholas Williams nicholas at nicholaswilliams.net
Sun Dec 29 01:47:06 UTC 2024


Hello,

I have an existing PowerDNS Recursor 4.0.4 server running on Debian Jessie 8 (I know, I know, out of date ... I'm getting to that). It handles all DNS requests for my home lab network. It has a fairly simple config and has worked without interruption for literally years at a time. It also is configured to validate and successfully validates all DNSSEC.

Last night, shortly after midnight, it stopped resolving about half of all domains worldwide, returning `SERVFAIL` for them. Sometimes it will resolve the primary domain (such as `athenahealth.com`) but not a subdomain (such as `20785-1.portal.athenahealth.com`). Sometimes it will not resolve the primary domain (such as `serverfault.com` or `askubuntu.com`). I haven't been able to find any pattern, and no matter how I've mucked with my config (including turning DNSSEC completely off), it doesn't fix the problem.

My next thought was that I needed to upgrade PowerDNS Recursor, but I couldn't because of how old my DNS server was. So, I built out a brand new server running PowerDNS Recursor 5.1.3 on Ubuntu 24.04.1. Again, the config is simple. Here's the primary file:

```
$ cat /etc/powerdns/recursor.conf 
dnssec:
  # validation: process # default
  trustanchorfile: /usr/share/dns/root.key
recursor:
  hint_file: /usr/share/dns/root.hints
  include_dir: /etc/powerdns/recursor.d
#incoming:
 # listen:
 # - 127.0.0.1 # default
#outgoing:
 # source_address:
 # - 0.0.0.0 # default
```

And here's a file in `recursor.d`:

```
$ cat /etc/powerdns/recursor.d/me.yml 
dnssec:
  validation: off # validate
#  log_bogus: true
incoming:
  listen:
    - 10.20.30.76:53
logging:
  common_errors: true
  facility: 1
  loglevel: 6
  quiet: true
  trace: fail
recursor:
  auth_zones:
    - zone: my-domain-1.com
      file: /etc/powerdns/my-domain-1.com.zone
  forward_zones:
    - zone: my-domain-2.com
      forwarders:
        - 10.20.31.2
  setgid: pdns
  setuid: pdns
  socket_dir: /var/run
  write_pid: true
webservice:
  address: 10.20.30.76
  allow_from:
    - 10.20.30.0/24
    - 172.24.52.0/24
  api_key: loremipsum
  password: foobarbazqux
  port: 8080
```

This config is identical to my old PowerDNS Recursor config except that DNSSEC is disabled to try to get it to work. If I manually `dig` (I love `dig`) `askubuntu.com` from the root up, I easily find an answer:

```
# Using i.root-servers.net is 192.36.148.17
$ dig @192.36.148.17 com NS

; <<>> DiG 9.10.6 <<>> @192.36.148.17 com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2217
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 21

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;com.				IN	NS

;; ANSWER SECTION:
com.			136670	IN	NS	d.gtld-servers.net.
com.			136670	IN	NS	c.gtld-servers.net.
com.			136670	IN	NS	k.gtld-servers.net.
com.			136670	IN	NS	f.gtld-servers.net.
com.			136670	IN	NS	i.gtld-servers.net.
com.			136670	IN	NS	b.gtld-servers.net.
com.			136670	IN	NS	l.gtld-servers.net.
com.			136670	IN	NS	a.gtld-servers.net.
com.			136670	IN	NS	e.gtld-servers.net.
com.			136670	IN	NS	m.gtld-servers.net.
com.			136670	IN	NS	j.gtld-servers.net.
com.			136670	IN	NS	h.gtld-servers.net.
com.			136670	IN	NS	g.gtld-servers.net.

;; ADDITIONAL SECTION:
b.gtld-servers.net.	43604	IN	A	192.33.14.30
b.gtld-servers.net.	71837	IN	AAAA	2001:503:231d::2:30
l.gtld-servers.net.	44115	IN	A	192.41.162.30
l.gtld-servers.net.	74612	IN	AAAA	2001:500:d937::30
a.gtld-servers.net.	59944	IN	A	192.5.6.30
a.gtld-servers.net.	52029	IN	AAAA	2001:503:a83e::2:30
e.gtld-servers.net.	11582	IN	A	192.12.94.30
e.gtld-servers.net.	63219	IN	AAAA	2001:502:1ca1::30
m.gtld-servers.net.	27782	IN	A	192.55.83.30
m.gtld-servers.net.	50020	IN	AAAA	2001:501:b1f9::30
j.gtld-servers.net.	39663	IN	A	192.48.79.30
h.gtld-servers.net.	79936	IN	A	192.54.112.30
g.gtld-servers.net.	57527	IN	A	192.42.93.30
g.gtld-servers.net.	63219	IN	AAAA	2001:503:eea3::30
d.gtld-servers.net.	44435	IN	A	192.31.80.30
d.gtld-servers.net.	10633	IN	AAAA	2001:500:856e::30
c.gtld-servers.net.	50185	IN	A	192.26.92.30
k.gtld-servers.net.	32146	IN	A	192.52.178.30
i.gtld-servers.net.	48002	IN	A	192.43.172.30
i.gtld-servers.net.	27967	IN	AAAA	2001:503:39c1::30

$ dig @192.33.14.30 askubuntu.com NS

; <<>> DiG 9.10.6 <<>> @192.33.14.30 askubuntu.com NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46168
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 13

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com.			IN	NS

;; ANSWER SECTION:
askubuntu.com.		86400	IN	NS	sureena.ns.cloudflare.com.
askubuntu.com.		86400	IN	NS	damian.ns.cloudflare.com.

;; ADDITIONAL SECTION:
damian.ns.cloudflare.com. 48087	IN	A	172.64.35.50
damian.ns.cloudflare.com. 48087	IN	A	162.159.44.50
damian.ns.cloudflare.com. 48087	IN	A	108.162.195.50
damian.ns.cloudflare.com. 13178	IN	AAAA	2803:f800:50::6ca2:c332
damian.ns.cloudflare.com. 13178	IN	AAAA	2606:4700:58::a29f:2c32
damian.ns.cloudflare.com. 13178	IN	AAAA	2a06:98c1:50::ac40:2332
sureena.ns.cloudflare.com. 38809 IN	A	108.162.194.126
sureena.ns.cloudflare.com. 38809 IN	A	172.64.34.126
sureena.ns.cloudflare.com. 38809 IN	A	162.159.38.126
sureena.ns.cloudflare.com. 32427 IN	AAAA	2a06:98c1:50::ac40:227e
sureena.ns.cloudflare.com. 32427 IN	AAAA	2803:f800:50::6ca2:c27e
sureena.ns.cloudflare.com. 32427 IN	AAAA	2606:4700:50::a29f:267e

$ dig @172.64.35.50 askubuntu.com A

; <<>> DiG 9.10.6 <<>> @172.64.35.50 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35705
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;askubuntu.com.			IN	A

;; ANSWER SECTION:
askubuntu.com.		300	IN	A	172.64.150.156
askubuntu.com.		300	IN	A	104.18.37.100
```

Perfect. But if I ask either my existing PowerDNS Recursor 4.0.4 server or my new PowerDNS Recursor 5.1.3 server, I get `SERVFAIL`:

```
$ dig @10.20.30.76 askubuntu.com A

; <<>> DiG 9.10.6 <<>> @10.20.30.76 askubuntu.com A
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58213
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 63 6f 6d ("..delegation com")
;; QUESTION SECTION:
;askubuntu.com.			IN	A
```

The `OPT=15` line with some kind of signature plus `delegation com` is interesting. It's not happening on every domain that's failing to resolve, so it might be a red herring (and it changes ... like running that same query again resulted in `OPT=15: 00 16 64 65 6c 65 67 61 74 69 6f 6e 20 61 73 6b 75 62 75 6e 74 75 2e 63 6f 6d ("..delegation askubuntu.com")`).

Here is the PowerDNS Recursor 5.1.3 fail trace for a failed lookup of `askubuntu.com`: https://gist.github.com/beamerblvd/d8fa24bdf1037e2a670f8e331b7e4905

FWIW, I'm on Comcast Business Class with a 5-address static IP delegation.

What am I doing wrong?

Thanks,

Nick



More information about the Pdns-users mailing list