[dnsdist] dnsdist 1.4 and Debian Buster
lists+pdns at gbe0.com
Thu Aug 8 10:04:32 UTC 2019
On 8/08/2019 4:46 pm, Remi Gacogne wrote:
> That's actually one of the most readable configuration I have seen in a
> while, don't worry ;-)
Good to know, still got some work to do on it to make it more friendly
> That's very weird, I don't see anything unusual in your configuration,
> the backtrace seems to indicate that all threads are working as
> expected, and I even see some UDP queries being received and forwarded
> in the strace (albeit very few, you can spot them easily by looking for
> "recvmsg resumed" with grep).
That is strange. When the issue occurs it will receive minimal traffic
except from the health checking service that controls the IP's being
announced with BGP.
I just noticed there is even more strange behavior. I restarted the
dnsdist instance and sent traffic for it to reproduce the issue. While
it was working I made a 'ANY' query for google.com. One the issue
occured I could still send that query and get an answer (both with UDP
and TCP). Queries for things that were not in the cache I guess is what
> Would you mind providing a 'lsof -n -p <pid of dnsdist>' while it's
The lsof output is available here:
> Would you by any chance be able to do a strace when it's stuck,
> while at the same time sending a few UDP queries to it, ideally with an
> easily recognizable qname like "why-is-dnsdist-not-responding.to.this." ?
The stack trace is available here:
During the stack trace I performed 4 requests (in order)
- UDP A request for why-is-dnsdist-not-responding.to.this. (not working)
- TCP A request for why-is-dnsdist-not-responding.to.this. (working)
- UDP ANY request for google.com (working)
- UDP A request for google.com (not working)
> Do you collect some metrics via prometheus? I don't see a carbon export,
> you might want to send some metrics to our public metronome server 
> for a while, just from one box, we might some spot something there.
I'll configure this shortly to the public metronome server.
> Also, apart from Debian being upgraded from Stretch to Buster and
> dnsdist from 1.3.x to 1.4.0-beta2, did anything else change in your setup?
To be clear, I actually installed a new copy of Debian, I didn't upgrade
the existing stretch install.
The dnsdist configuration changed slightly:
- I originally wrote a lua function for load balancing. Now I am using
poolAvailable with rules so I can use a built in method.
- The rules were tidied up a bit, previously each dnsdist instance had
left over rules that were no longer required
- The cache sizes were adjusted
More information about the dnsdist