<div dir="ltr"><div dir="ltr"><div>Otto,</div><div> It took me a while to come back to this but I made changes as per your suggestions shortly after your last reply.<br></div><div>- I reverted the max-negative-ttl to default. Performance seems markedly improved.<br></div><div>- I removed the lua so no drops will occur and many server clients seem much happier.<br></div><div>- I've begun collecting the metrics available by the API and graphing them to watch for trending patterns.<br></div><div><br></div><div>I mostly just wanted to say thank you for the support and I will start a new thread should I need further assistance in the future.</div><div><br></div><div>Sincerely,</div><div>Scotsie<br></div><div><br>
</div></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Sat, Apr 19, 2025 at 3:29 AM Otto Moerbeek <<a href="mailto:otto@drijf.net">otto@drijf.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Remarks inline.<br>
<br>
On Fri, Apr 18, 2025 at 07:04:18PM -0400, Scott Crace wrote:<br>
<br>
> Otto,<br>
> Thanks for your assistance.Since these were setup with private IPs I wasn't<br>
> sure how useful the config would be however, I have included it below.<br>
> <br>
> # rec_control dump-throttlemap -<br>
> ; throttle map dump follows<br>
> ; remote IP qname qtype count ttd reason<br>
> 10.0.196.197 0.10.in-addr.arpa A 2 2025-04-18T18:44:22<br>
> RCodeRefused<br>
> 10.0.196.197 10.10.in-addr.arpa A 3 2025-04-18T18:44:25<br>
> RCodeRefused<br>
> 10.0.196.197 255.10.in-addr.arpa A 1 2025-04-18T18:44:23<br>
> RCodeRefused<br>
> 10.0.62.244 0.10.in-addr.arpa A 2 2025-04-18T18:44:22<br>
> RCodeRefused<br>
> 10.0.62.244 10.10.in-addr.arpa A 3 2025-04-18T18:44:25<br>
> RCodeRefused<br>
> 10.0.62.244 255.10.in-addr.arpa A 2 2025-04-18T18:44:23<br>
> RCodeRefused<br>
> dump-throttlemap: dumped 6 records<br>
<br>
Looking at your config below, You are forwarding to servers that do not<br>
want to answers those queries. Make sure you either do not forward or<br>
change the auths to respond properly. "Refused" means the auth does<br>
not have the particular zone. An auth responding Refused on a lot of<br>
queries will be throttled for those specific queries.<br>
<br>
> <br>
> # rec_control dump-failedservers -<br>
> I removed any count 1 or 2 for brevity since this email is already a long<br>
> read.<br>
> ; failed servers dump follows<br>
> ; remote IP count timestamp<br>
> 203.119.25.5 8 2025-04-18T18:43:44<br>
> 203.119.26.5 8 2025-04-18T18:43:42<br>
> 203.119.27.5 8 2025-04-18T18:43:41<br>
> 203.119.28.5 8 2025-04-18T18:43:39<br>
> 203.119.29.5 8 2025-04-18T18:43:45<br>
> 200.189.41.10 7 2025-04-18T18:42:46<br>
> 200.219.148.10 6 2025-04-18T18:39:47<br>
> 200.219.154.10 6 2025-04-18T18:42:43<br>
> 200.219.159.10 7 2025-04-18T18:42:45<br>
> 200.192.233.10 7 2025-04-18T18:42:40<br>
> 200.229.248.10 4 2025-04-18T18:42:42<br>
> 203.119.95.53 3 2025-04-18T18:39:30<br>
> 203.119.86.101 1229 2025-04-18T18:40:03<br>
> 35.173.255.124 4895 2025-04-18T18:36:21<br>
> dump-failedservers: dumped 43 records<br>
<br>
Depending on how long your recursor is running, some of these counts<br>
are pretty high. This *might* indicate connectivity issues, but no<br>
defnite conclusion, some network trouble shooting might be in place<br>
esepcially as 203.119.86.101 is <a href="http://ns3.apnic.net" rel="noreferrer" target="_blank">ns3.apnic.net</a>, which *should* be a<br>
server that's reachable and responding properly. 35.173.255.124 looks<br>
like a random aws IP.<br>
<br>
> <br>
> <br>
> Config(s)<br>
> <br>
> Please note that one of the zones forwarding is 'split brained' from a<br>
> legacy setup. The zone consists of a private Active Directory environment<br>
> and a separately maintained public zone. The configuration forwards to the<br>
> private AD servers and I believe the lua script drops queries that have no<br>
> match in that zone. The public zone is being slowly phased out.<br>
> <br>
> I noted while reviewing the previous server configs and found a comment<br>
> about this value but no context for the specific reasoning. This may<br>
> explain the values you noted but I would like to understand the<br>
> implications of removing it. It doesn't seem like something that should<br>
> have been enabled.<br>
> # <a href="https://github.com/PowerDNS/pdns/issues/6186" rel="noreferrer" target="_blank">https://github.com/PowerDNS/pdns/issues/6186</a><br>
> max-negative-ttl=0<br>
<br>
That is indeed potentially killing performance. Better leave it at the<br>
default, unless you have very specific reasons to change it. In<br>
practise any DNS server spends quite a lot of it's time answering<br>
negatively. Not caching negative answer will cause quite a lot of work<br>
since the recursor will need to contacts auths for each client query<br>
that will lead to a negative answer again and again.<br>
<br>
A common cause to dislike negative caching is (for a name in a locally<br>
managed zone):<br>
<br>
1. Query rec for a name and see that it does not exist (NODATA answer)<br>
2. Modify the auth zone so the name exists<br>
3. Query again and see that it still does not exist because of negative<br>
caching in rec.<br>
<br>
The answer to this is not to "disable negative chaching". The proper<br>
answer is: avoid the initial query, have some patience or flush the<br>
rec cache for that name by using rec_control or sending rec a notify<br>
(notify rec is a relative new feature, and needs to be set up to allow<br>
it, see<br>
<a href="https://docs.powerdns.com/recursor/yamlsettings.html#incoming-allow-notify-from" rel="noreferrer" target="_blank">https://docs.powerdns.com/recursor/yamlsettings.html#incoming-allow-notify-from</a>).<br>
<br>
> <br>
> /etc/pdns-recursor/recursor.conf<br>
> <br>
> ---<br>
> <br>
> dnssec:<br>
> <br>
> validation: validate<br>
> <br>
> incoming:<br>
> <br>
> allow_from:<br>
> <br>
> - <a href="http://127.0.0.1/8" rel="noreferrer" target="_blank">127.0.0.1/8</a><br>
> <br>
> - <a href="http://10.0.0.0/8" rel="noreferrer" target="_blank">10.0.0.0/8</a><br>
> <br>
> - <a href="http://172.16.0.0/12" rel="noreferrer" target="_blank">172.16.0.0/12</a><br>
> <br>
> - <a href="http://192.168.0.0/16" rel="noreferrer" target="_blank">192.168.0.0/16</a><br>
> <br>
> - 'fd00::/8'<br>
> <br>
> - '2607:B600::/32'<br>
> <br>
> listen:<br>
> <br>
> - 0.0.0.0<br>
> <br>
> max_tcp_clients: 128<br>
> <br>
> max_tcp_per_client: 0<br>
> <br>
> max_tcp_queries_per_connection: 0<br>
> <br>
> port: 53<br>
> <br>
> tcp_timeout: 2<br>
> <br>
> outgoing:<br>
> <br>
> dont_query: []<br>
> <br>
> max_qperq: 50<br>
> <br>
> network_timeout: 1500<br>
> <br>
> packetcache:<br>
> <br>
> max_entries: 1000000<br>
> <br>
> recordcache:<br>
> <br>
> max_entries: 1000000<br>
> <br>
> max_negative_ttl: 0<br>
> <br>
> max_ttl: 86400<br>
> <br>
> recursor:<br>
> <br>
> daemon: false<br>
> <br>
> forward_zones:<br>
> <br>
> - zone: <a href="http://momentumbusiness.com" rel="noreferrer" target="_blank">momentumbusiness.com</a><br>
> <br>
> recurse: false<br>
> <br>
> forwarders:<br>
> <br>
> - 10.255.255.76<br>
> <br>
> - 10.1.3.228<br>
> <br>
> - zone: 10.in-addr.arpa<br>
> <br>
> recurse: false<br>
> <br>
> forwarders:<br>
> <br>
> - 10.0.196.197<br>
> <br>
> - 10.0.62.244<br>
> <br>
> - zone: 168.192.in-addr.arpa<br>
> <br>
> recurse: false<br>
> <br>
> forwarders:<br>
> <br>
> - 10.0.196.197<br>
> <br>
> - 10.0.62.244<br>
> <br>
> - zone: 16.172.in-addr.arpa<br>
> <br>
> recurse: false<br>
> <br>
> forwarders:<br>
> <br>
> - 10.0.196.197<br>
> <br>
> - 10.0.62.244<br>
> <br>
> lua_dns_script: /etc/pdns-recursor/momentumbusiness_com.lua<br>
> <br>
> max_recursion_depth: 40<br>
> <br>
> max_total_msec: 7000<br>
> <br>
> minimum_ttl_override: 1<br>
> <br>
> server_id: <a href="http://nsres01.momentumtelecom.com" rel="noreferrer" target="_blank">nsres01.momentumtelecom.com</a><br>
> <br>
> setgid: pdns-recursor<br>
> <br>
> setuid: pdns-recursor<br>
> <br>
> webservice:<br>
> <br>
> address: 0.0.0.0<br>
> <br>
> allow_from:<br>
> <br>
> - 192.168.9.164<br>
> <br>
> - 192.168.21.134<br>
> <br>
> - <a href="http://192.168.20.0/24" rel="noreferrer" target="_blank">192.168.20.0/24</a><br>
> <br>
> api_key: <sanitized><br>
> <br>
> port: 8080<br>
> <br>
> webserver: true<br>
> <br>
> logging:<br>
> <br>
> loglevel: 3<br>
> <br>
> ...<br>
> <br>
> /etc/pdns-recursor/momentumbusiness_com.lua<br>
> pdnslog("Lua NXDomain filter for <a href="http://momentumbusiness.com" rel="noreferrer" target="_blank">momentumbusiness.com</a> loading...",<br>
> pdns.loglevels.Notice)<br>
> nxdomainsuffix=newDN("<a href="http://momentumbusiness.com" rel="noreferrer" target="_blank">momentumbusiness.com</a>")<br>
> function nxdomain(dq)<br>
> if dq.qname:isPartOf(nxdomainsuffix)<br>
> then<br>
> dq.appliedPolicy.policyKind = pdns.policykinds.Drop<br>
> return true<br>
> end<br>
> return false<br>
> end<br>
<br>
I do wonder what's the purpose of this special nxdoamin handling is. A<br>
drop is not nice to clients, as the query will timeout out from their<br>
perspective. Maybe pdns.policykinds.NODATA or just leaving the special<br>
handling out?<br>
<br>
> <br>
> On Fri, Apr 18, 2025 at 9:39 AM Otto Moerbeek <<a href="mailto:otto@drijf.net" target="_blank">otto@drijf.net</a>> wrote:<br>
> <br>
> > On Fri, Apr 18, 2025 at 08:28:48AM -0400, Scott Crace via Pdns-users wrote:<br>
> ><br>
> > Hi,<br>
> ><br>
> > Please include your config. That said:<br>
> ><br>
> > You seem to have pretty low cache hit ratio, a high number of outgoing<br>
> > queries. How is your cache configged?<br>
> ><br>
> > Also some throttling is going on. I suspect rec has trouble contacting<br>
> > one or more auths or forwarders. The throttling tables can be viewed<br>
> > using<br>
> ><br>
> > rec_control dump-throttlemap -<br>
> > rec_control dump-failedservers -<br>
> ><br>
> > Also, what happens *during* the trace can be very relevant. If one<br>
> > auth (or forwarder) does not respond, rec will turn to another one,<br>
> > but only after the timeout of 1500ms by default.<br>
> ><br>
> > -Otto<br>
> ><br>
> > > Hello all,<br>
> > > Long time lurker on the message list and would like some performance<br>
> > > and/or tuning advice.<br>
> > > We've been using pdns-recursor as internal recursive nameservers for<br>
> > quite<br>
> > > some time now.<br>
> > > The original implementer of pdns departed and I was recently tasked with<br>
> > > replacing or upgrading all of the servers with newer RHEL9 versions. I<br>
> > > opted to build fresh and migrate the configuration to the latest 5.2<br>
> > > release.<br>
> > ><br>
> > > I'm hearing occasional complaints about odd issues and/or clients cycling<br>
> > > through their DNS servers rapidly (timeouts?). Manual testing DNS works<br>
> > but<br>
> > > I am reading through the metrics and performance documentation. I am<br>
> > hoping<br>
> > > someone with a more experienced eye could take a look at a sampling of<br>
> > the<br>
> > > periodic statistics report (below) and provide some insight or<br>
> > > prioritization on any urgent issues I should focus on studying first.<br>
> > ><br>
> > > My observations:<br>
> > > * I do note that the performance documentation talks about<br>
> > > firewalld/stateful firewalls impact but the legacy servers were using the<br>
> > > same basic setup. If the firewall is the problem is there a way to<br>
> > validate<br>
> > > this (other than stopping firewalld and waiting)?<br>
> > > * The "worker" threads seem evenly distributed to my novice eye and our<br>
> > qps<br>
> > > (queries per second) rate is low as I would expect since the name servers<br>
> > > are internal only resources.<br>
> > > * I ran a few pcaps and rec_control trace-regex for specific domain items<br>
> > > being reported as problematic. Everything seemed to be working with the<br>
> > > trace-regex always showing "Step3 Final resolve: No Error/6 or 8".<br>
> > ><br>
> > > Thank you in advance for your time and consideration.<br>
> > ><br>
> > > Sincerely,<br>
> > > Scotsie<br>
> > ><br>
> > > ```<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic<br>
> > statistics<br>
> > > report" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > cache-entries="23666" negcache-entries="497" questions="6831695"<br>
> > > record-cache-acquired="286931329" record-cache-contended="64414"<br>
> > > record-cache-contended-perc="0.02" record-cache-hitratio-perc="0.87"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic<br>
> > statistics<br>
> > > report" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > packetcache-acquired="16887684" packetcache-contended="1019"<br>
> > > packetcache-contended-perc="0.01" packetcache-entries="7112"<br>
> > > packetcache-hitratio-perc="37.75"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic<br>
> > statistics<br>
> > > report" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > edns-entries="38" failed-host-entries="50"<br>
> > > non-resolving-nameserver-entries="0" nsspeed-entries="968"<br>
> > > saved-parent-ns-sets-entries="65" throttle-entries="8"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic<br>
> > statistics<br>
> > > report" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > concurrent-queries="1" dot-outqueries="0" idle-tcpout-connections="0"<br>
> > > outgoing-timeouts="36594" outqueries="14668546"<br>
> > > outqueries-per-query-perc="214.71" tcp-outqueries="3131"<br>
> > > throttled-queries-perc="1.90"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic<br>
> > statistics<br>
> > > report" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > taskqueue-expired="0" taskqueue-pushed="540" taskqueue-size="0"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by<br>
> > > thread" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > count="3470098" thread="0" tname="worker"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by<br>
> > > thread" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.170"<br>
> > > count="3360836" thread="1" tname="worker"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Queries handled by<br>
> > > thread" subsystem="stats" level="0" prio="Info" tid="0"<br>
> > ts="1744920448.171"<br>
> > > count="764" thread="2" tname="tcpworker"<br>
> > > Apr 17 16:07:28 nsrecdns01-1 pdns-recursor[1092]: msg="Periodic QPS<br>
> > report"<br>
> > > subsystem="stats" level="0" prio="Info" tid="0" ts="1744920448.171"<br>
> > > averagedOver="1800" qps="117"<br>
> > > ```<br>
> ><br>
> > > _______________________________________________<br>
> > > Pdns-users mailing list<br>
> > > <a href="mailto:Pdns-users@mailman.powerdns.com" target="_blank">Pdns-users@mailman.powerdns.com</a><br>
> > > <a href="https://mailman.powerdns.com/mailman/listinfo/pdns-users" rel="noreferrer" target="_blank">https://mailman.powerdns.com/mailman/listinfo/pdns-users</a><br>
> ><br>
> ><br>
</blockquote></div></div>