[dnsdist] dnsdist 1.4 and Debian Buster

Chris lists+pdns at gbe0.com
Thu Aug 8 10:04:32 UTC 2019


Hi Remi,

On 8/08/2019 4:46 pm, Remi Gacogne wrote:
> That's actually one of the most readable configuration I have seen in a
> while, don't worry ;-)

Good to know, still got some work to do on it to make it more friendly 
though.

> That's very weird, I don't see anything unusual in your configuration,
> the backtrace seems to indicate that all threads are working as
> expected, and I even see some UDP queries being received and forwarded
> in the strace (albeit very few, you can spot them easily by looking for
> "recvmsg resumed" with grep).

That is strange. When the issue occurs it will receive minimal traffic 
except from the health checking service that controls the IP's being 
announced with BGP.

I just noticed there is even more strange behavior. I restarted the 
dnsdist instance and sent traffic for it to reproduce the issue. While 
it was working I made a 'ANY' query for google.com. One the issue 
occured I could still send that query and get an answer (both with UDP 
and TCP). Queries for things that were not in the cache I guess is what 
stopped working.

> Would you mind providing a 'lsof -n -p <pid of dnsdist>' while it's
> stuck?

The lsof output is available here:

https://gbe0.com/dnsdist/dnsdist_lsof.txt.gz

> Would you by any chance be able to do a strace when it's stuck,
> while at the same time sending a few UDP queries to it, ideally with an
> easily recognizable qname like "why-is-dnsdist-not-responding.to.this." ?

The stack trace is available here:

https://gbe0.com/dnsdist/dnsdist_strace2.txt.gz

During the stack trace I performed 4 requests (in order)

- UDP A request for why-is-dnsdist-not-responding.to.this. (not working)
- TCP A request for why-is-dnsdist-not-responding.to.this. (working)
- UDP ANY request for google.com (working)
- UDP A request for google.com (not working)

> Do you collect some metrics via prometheus? I don't see a carbon export,
> you might want to send some metrics to our public metronome server [1]
> for a while, just from one box, we might some spot something there.

I'll configure this shortly to the public metronome server.

> Also, apart from Debian being upgraded from Stretch to Buster and
> dnsdist from 1.3.x to 1.4.0-beta2, did anything else change in your setup?

To be clear, I actually installed a new copy of Debian, I didn't upgrade 
the existing stretch install.

The dnsdist configuration changed slightly:

- I originally wrote a lua function for load balancing. Now I am using 
poolAvailable with rules so I can use a built in method.
- The rules were tidied up a bit, previously each dnsdist instance had 
left over rules that were no longer required
- The cache sizes were adjusted

Thanks


More information about the dnsdist mailing list