[dnsdist] wrandom with downed servers?

Aaron de Bruyn aaron at heyaaron.com
Thu Jun 2 23:41:21 UTC 2022


I am using dnsdist to help route internal DNS traffic at a client site.
Each site has one local Windows DNS server that serves an internal domain
(i.e. 'somecustomer.local') and a VPN link to all the other sites (each
with their own Windows DNS server).

Every router has an anycast IP (10.100.100.100) that runs dnsdist and
everything throughout the network is pointed to it.

The goal was to make sure we can serve 'somecustomer.local' (the blasted
Windows domain) should the local Windows Server go down and also continue
to serve external domains without accidentally sending them queries for
somecustomer.local.

Our config is pretty straight forward (slight redaction and notes):

—
setLocal('10.100.100.100:53')
webserver('0.0.0.0:8053')
setWebserverConfig({password='—redacted—', apiKey='—redacted—', acl='
10.0.0.0/8, 127.0.0.1/32'})
setACL({'10.0.0.0/8', '127.0.0.0/8''})
newServer ({address='1.0.0.1', name='cloudflare1', pool='external', qps=50,
weight=1})
newServer ({address='1.1.1.1', name='cloudflare2', pool='external', qps=50,
weight=1})

# The local DC has the highest priority in the somecustomer pool because
the VPN is slow.
newServer ({address='ip.of.local.windows.dns', name='local-windows-dns',
pool='somecustomer', qps=500, weight=2147483647})

#The following line gets repeated multiple times (once for each remote
site):
newServer ({address='ip.of.remote.windows.dns', name='chicago-dns',
pool='somecustomer', qps=500, weight=1}) # A remote DC has the lowest
priority in the somecustomer pool because the VPN is slow.
newServer ({address='ip.of.remote.windows.dns', name='portland-dns',
pool='somecustomer', qps=500, weight=1}) # A remote DC has the lowest
priority in the somecustomer pool because the VPN is slow.
...etc...

setServerPolicy(wrandom)

addAction({'somecustomer.local'}, PoolAction('somecustomer')) # If anyone
queries us for the internal domain, send them to the 'somecustomer' pool.
addAction ({ip.of.local.windows.dns}, PoolAction('external')) # If the
local Windows DNS server queries us, use the 'external' pool
addAction({'10.0.0.0/8'}, PoolAction('external')) # if the other rules
didn't match and it's coming from our internal IP block, send it to the
external DNS servers.
—

This appeared to do exactly what we wanted. Queries for somecustomer.local
we routed to the local DNS server, and everything else was sent on to
CloudFlare.

During testing we took down the local DNS server. All queries for the
internal domain started timing out.
The dnsdist web interface showed the local Windows DNS server as being
"down", but it was still routing queries to it.

Does wrandom ignore a server being down and just pay attention to weight?

As a test, we tried switching to roundrobin.

When the local Windows DNS server was turned off, queries were still
completed by 'remote' Windows DNS servers...but when the local Windows DNS
server was working (and showed 'up' in the web interface), it continued to
roundrobin queries to non-local Windows DNS servers because roundrobin
appears to up/down status, but not weight.

Do I need to write my own policy in Lua in order to pay attention to *both*
the up/down status *and* the weight, or am I missing something?

Thanks,

-A
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20220602/06176721/attachment.htm>


More information about the dnsdist mailing list