<div dir="ltr">Hi Everyone,<div><br></div><div>I'm having some trouble with my recursor seemingly getting overloaded and returning servfail messages for addresses it should otherwise return successfully. </div><div><br>
</div><div>In a nutshell, I have a DNS recursor that I provide to customers on my network, it limits queries to only CIDR ranges I control. However, some of my customers seem to be inadvertently running open recursors on the Internet by using Windows AD, and when their machine can't resolve a bogus name, they forward it along to my recursors. </div>
<div><br></div><div>The names being looked up are complete garbage, e.g., I'm watching tcpdump right now and see a request for <a href="http://hckcj6aq71ae0f6c.net">hckcj6aq71ae0f6c.net</a>, <a href="http://hckq6cj1682cz7hph3b98fypjw3lzoy.com">hckq6cj1682cz7hph3b98fypjw3lzoy.com</a>, etc). Each time one of these gets looked up it takes my recusor approximately 2 seconds to figure out that it's not a real domain. After that it's cached, but since there is so much junk coming in my thread count shoots up to 1000 and plateuaus there. I see about 3000 reqs per sec on average aggregate but it can shoot up to 6000. </div>
<div><br></div><div>When I use dnstop I can pretty quickly establish what customers are giving me trouble, and I can even see a pattern of certain bogus domains being hit more frequently than others. If I put in forward entries on my server for some of the most frequently hit bogus domains things do calm down because I suspect it's not longer reaching out to TLD DNS servers to try to look up the bogus entry and it more or less immediately returns a SERVFAIL.</div>
<div><br></div><div>I understand the proper approach is to tell the customers to stop allowing DNS recursion on the public internet, and I'm working on that. However, I have thousands of customer machines and it's likely that this will crop up again. So my questions are:</div>
<div><br></div><div>(1) Do you suspect this is a DNS amplification attack where my customers machines are getting abused? Or some other kind of attack (e.g. DNS cache poisoning?)</div><div>(2) I've considered using iptables to slow down the query rate allowed by the customers but in the documentation it says I should be wary of using iptables since the volume of traffic could quickly overwhelm it? I noticed there is a throttle mechanism mentioned in the documentation but I can't determine whether that's something I can configure or if it's just built in logic. </div>
<div>(3) In general, what would you recommend to be proactive with something like this? I'm thinking about writing some code to run dnstop and look for customers that seem to be misconfigured and then put in ACLs on my network appliances to block their traffic to my recursors until they remedy their machines, however this seems heavy handed.</div>
<div><br></div><div>Thanks for your help, I appreciate it!</div><div><br></div><div>-Russ</div><div><br></div><div><br></div></div>