> Hi there and have a Good day!
> Andrey Sedletsky on behalf PJSC MGTS (Moscow City Telephone Network) 
> company!
> We are using your recursive DNS servers (Open Source PowerDNS 
> recurser) and we've got a couple of questions to you (actually more).
> We were contacted by one of our clients with the problem of the 
> inability to resolve the domain name "cm.taxi".
> From the request trace on the server, it can be seen that PowerDNS 
> does not accept a response from an authoritative server because the AA 
> (Authoritative Answer) flag is not set to one.
> Sep 04 01:47:38 a975-icache02 pdns_recursor[2575]: Removing record 
> 'cm.taxi|A|' in the answer section without the AA bit set 
> received from cm.taxi
> Sep 04 01:47:38 a975-icache02 pdns_recursor[2575]: Removing record 
> 'cm.taxi|A|' in the answer section without the AA bit set 
> received from cm.taxi
> The full log can be found in the attachment, there is also a dump file 
> illustrating the problem.
> So our first question. Whether this is a normal behavior of PowerDNS 
> Recursor and can it be changed (in general or for specific zones) ?
> Also, not so long ago, we had an issue when restarting the 
> pdns-recursor process. After the restart (around 11 am), the number of 
> servfail responses towards clients began to increase.
> The load on the server at this moment was about 300 thousand requests 
> per second.
> By the evening (about 22 hours), the number of servfail responses 
> began to approach 30 percent of the total number of requests,
> and the call center began to receive mass appeals from subscribers 
> about the impossibility of resolving domain names.
> By this time, the load has grown to 400 thousand requests per second 
> (the standard value for the current time of day).
> Switching to a backup server with a similar configuration (hardware 
> and software) did not solve the problem. It was reproduced on the 
> backup server too.  The restart did not help either.
> In the end, the problem was solved by reducing the parameter 
> max-threads=16 to eight.
> In this regard, there are a number of questions.
> What could be the reason for this behavior (until the problem 
> occurred, the server was working normally for several months at the 
> same load and with the same configuration) ?
> What tests should be performed to identify bottlenecks in the system 
> and the pdns-recursor itself?
> What metrics should be put on monitoring to prevent the occurrence of 
> such situations?
> And again in the attachment there is a screenshot illustrating the 
> situation at that time.
> One last question.
> Our company would like to have commercial support for your product. Is 
> this possible and, if so, what needs to be done for this ?
Below is the link to the attachments:
> Additional information:
> >rec_control version
> 4.3.6
> > less /etc/oracle-release
> Oracle Linux Server release 8.4
> >2 CPUs (28 cores, 56 threads)
> >128 GB RAM
> PDNS was installed from EPEL Repo
> grep -i process recursor.conf
> # dnssec        DNSSEC mode: off/process-no-validate 
> (default)/process/log-fail/validate
> # dnssec=process-no-validate
> Best Regards,
> Andrey

