<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 09/12/2020 07:30, Kiran Kumar via

      Pdns-users wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:114596641.4129201.1607499049680@mail.yahoo.com">

      <div dir="ltr" data-setdir="false">How do we minimize

        answers-slow, We are running on <span>CentOS Linux release

          7.9.2009 (Core)</span></div>

      <div dir="ltr" data-setdir="false">on VM with 4VCPUs and 16GB

        RAM. </div>

      <div dir="ltr" data-setdir="false"><br>

      </div>

      <div dir="ltr" data-setdir="false">

        <div>

          <div>rec_control get-all | grep answer</div>

          <div><b>answers-slow    80903</b></div>

          <div>answers0-1      598471</div>

          <div>answers1-10     1057756</div>

          <div>answers10-100   2342082</div>

          <div>answers100-1000 1341675</div>

        </div>

      </div>

    </blockquote>

    <p>For explanation see:

      <a class="moz-txt-link-freetext" href="https://docs.powerdns.com/recursor/metrics.html#gathered-information">https://docs.powerdns.com/recursor/metrics.html#gathered-information</a></p>

    <p>answers-slow is queries answered after more than 1 second, and in

      your case represent 1.5% of answers, except you've not shown

      packetcache-hits so the fraction of client queries affected will

      likely be far less than that.</p>

    <p>In resolving a given query, the recursor is going to have to

      contact one or more authoritative nameservers on the Internet. 

      These are some reasons why it might take more than 1 second to get

      the final answer:<br>

    </p>

    <p>- the answer is not already in cache (obviously) - this happens

      more frequently if there is low TTL in the authoritative server

      for that domain; AND<br>

      - the first authoritative server tried is down (or transient

      network problem to that server), so pdns times out and tries

      another one; OR<br>

      - multiple authoritative servers need to be contacted, with a

      large round-trip time to each; OR<br>

      - the client is querying for a domain which is completely lame /

      broken and cannot find any answer.<br>

      <br>

      This doesn't necessarily indicate a problem with your own pdns

      server at all.  It could just as well be problems with some

      authoritative domains on the Internet. Heaven knows there are

      plenty of broken domains out there :-)</p>

    <p>It could however be made worse by packet loss or congestion on

      your network or your network's upstream link.  If your recursor is

      on a private IP address behind a NAT, it would be better to put it

      on a public IP address, so that it doesn't have to generate NAT

      state for every outbound query it makes.  If your uplink is

      congested, which will cause latency and packet loss, then there's

      not much you can do short of buying more bandwidth.</p>

    <p>It could be made worse by excessive load on your server causing

      it to fall behind or drop queries, or insufficient RAM causing it

      to kick out cache entries prematurely, so you should also use a

      suitable tool to monitor your server resource utilisation (<a

        moz-do-not-send="true" href="https://github.com/netdata/netdata">netdata</a>

      is very good for this, monitoring at 1-second resolution by

      default so lets you see short bursts of activity).  However, your

      server may be completely fine.<br>

    </p>

    <p>For comparison, here's the tiny cache on my home network:<br>

    </p>

    <p>root@cache1:~# rec_control get-all | egrep

      '^(answers|packetcache-hits|over-capacity-drops|policy-drops)'<br>

      answers-slow    348<br>

      answers0-1    6118<br>

      answers1-10    7149<br>

      answers10-100    9074<br>

      answers100-1000    4695<br>

      over-capacity-drops    0<br>

      packetcache-hits    1983665<br>

      policy-drops    0<br>

    </p>

    <p>and here's a production DNS cache in a data centre:</p>

    <p>root@wrn-dns1:~# rec_control get-all | egrep

      '^(answers|packetcache-hits|over-capacity-drops|policy-drops)'<br>

      answers-slow    1710185<br>

      answers0-1    40045388<br>

      answers1-10    132638392<br>

      answers10-100    101328465<br>

      answers100-1000    11033827<br>

      over-capacity-drops    0<br>

      packetcache-hits    8907014600<br>

      policy-drops    0<br>

    </p>

    <p>The fraction of answers-slow out of answersXXXX is not hugely

      different from what you see. Also notice that packetcache-hits is

      far higher again.<br>

    </p>

    <p>Regards,</p>

    <p>Brian.<br>

    </p>

  </body>

</html>