[dnsdist] DoH issues after 1.8.3 -> 1.9.0 upgrade

Otto Moerbeek otto at drijf.net
Sun Mar 17 18:12:41 UTC 2024


On Sun, Mar 17, 2024 at 06:41:13PM +0100, Christoph via dnsdist wrote:

> Hi,
> 
> in February we upgraded our test DoH/DoT server from 1.8.3 to 1.9.0
> but we did not notice any problems so we upgraded our production server
> from 1.8.3 to 1.9.0 yesterday.
> 
> Immediately after upgrading our monitoring claimed our DoH service is
> unavailable (HTTP 400) but we were unable to reproduce it using firefox.
> 
> A closer look confirmed that there is some issue because we see about 50%
> less DoH requests in our grafana graphs showing DoH request rates.
> 
> Having a look at the request rates per HTTP method suggests that we "loose"
> almost all GET requests but also a significant fraction of POST DoH
> requests.
> 
> sum by (method) (irate(dnsdist_frontend_doh_http_method_queries{job="$job"}[$__rate_interval]))
> 
> After looking at the TLS versions graph I noticed a clear correlation
> but then I realized that all our DoH requests are TLS version 1.3
> because we set minTLSVersion='tls1.3' - so this might be irrelevant.
> 
> irate(dnsdist_frontend_tlsqueries{job="$job"}[$__rate_interval])
> 
> 2024-03-16 20:57:59 dnsdist upgraded: 1.8.3_1 -> 1.9.0
> 2024-03-16 20:59 monitoring says DoH is down (HTTP 400 - Bad Request)
> monitoring requests this: https://doh.applied-privacy.net/query?dns=l1sBAAABAAAAAAAAA3d3dw1rbm90LXJlc29sdmVyAmN6AAAcAAE
> Mar 17 02:40:45 bender-dpriv1 kernel: pid 77544 (dnsdist), jid 0, uid 208:
> exited on signal 11 -> also interesting put likely unrelated?
> 
> Today we downgraded to 1.8.3, and everything went back to normal.
> 
> Is anyone else observing similar issues on dnsdist 1.9.0?
> 
> DoT does not appear to be affected.
> 
> best regards,
> Christoph
> 
> OS: FreeBSD 13.2
> dnsdist installed via pkg
> 
> our dnsdist config:
> 
> newServer({address="109.70.100.136", maxInFlight=1000, sockets=32,
> name="clamps"})
> newServer({address="109.70.100.140", maxInFlight=1000, sockets=32,
> name="roberto"})
> --newServer({address="109.70.100.133", sockets=4, name="titanius-dpriv1"})
> setServerPolicy(leastOutstanding)
> 
> addTLSLocal("0.0.0.0",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> addTLSLocal("[::]",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key", {ciphers='ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256',
> minTLSVersion='tls1.2', tcpFastOpenQueueSize=1000, maxInFlight=1000 })
> 
> addDOHLocal("0.0.0.0:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> addDOHLocal("[::]:444",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.crt",
> "/usr/local/etc/ssl/lego/certificates/doh.applied-privacy.net.key",
> "/query", {minTLSVersion='tls1.3', serverTokens='doh',
> tcpFastOpenQueueSize=1000, tcpListenQueueSize=4096 })
> 
> setACL({'0.0.0.0/0', '::/0'})
> controlSocket('127.0.0.1:5199')
> setConsoleACL('127.0.0.1/8')
> 
> setKey(....)
> 
> pc = newPacketCache(50000, {maxTTL=86400, minTTL=3, temporaryFailureTTL=60,
> staleTTL=60, dontAge=false})
> getPool(""):setCache(pc)
> 
> webserver("127.0.0.1:8083")
> setWebserverConfig({...})
> setVerboseHealthChecks(true)
> addAction(QTypeRule(65535), RCodeAction(DNSRCode.NOTIMP))


This might be related: https://github.com/PowerDNS/pdns/issues/13850,
not backported yet

	-Otto



More information about the dnsdist mailing list