[Pdns-users] Cache Problems with upgrade to Recursor 3.3
Kenneth Marshall
ktm at rice.edu
Wed Dec 1 19:08:30 UTC 2010
On Wed, Dec 01, 2010 at 12:40:40PM -0600, Jeremy Utley wrote:
> Good afternoon,
>
> We've been working on upgrading our recursors from pdns-recursor-3.1.7.1-1
> to pdns-recursor-3.3-1, and have seen some oddities I wanted to ask the
> list about. First, a basic rundown of our environment:
>
> Our existing production servers are running pdns-recursor-3.1.7.1-1
> installed via RPMs downloaded from your website. The recursor itself is
> ran within a Xen PV virtual machine on a CentOS 5.5 base. To ensure we
> utilize all 4 cores of the processors in those machines, 2 instances of the
> recursor are launched simultaneously, listening on different IP addresses,
> and we utilize the fork option. We have a total of 6 machines configured
> this way, behind a Foundry load balancer which handles sharing the load
> between them. This implementation has been in place for about a year with
> no issues. We also use Cacti graphs for collecting performance data, by
> extending SNMP with output from the rec_control command.
>
> The new test server is pdns-recursor-3.3-1 installed via RPM downloaded
> from your website, and also running within a Xen PV virtual machine on a
> CentOS 5.5 base. Rather than launching multiple instances, we are
> launching 4 recursor threads (machines have 4 CPU cores). Most other
> settings are configured identically between old and new servers. This test
> server was added to the load balancer on Monday afternoon, taking a
> fraction of the traffic that would have gone to the 6 old machines.
>
> The problem I'm seeing is the caching does not seem to be working properly,
> which is causing a performance hit. To document this effect, the following
> graph images were taken a little while ago from our Cacti installation:
>
> http://www.jutley.org/DNS
>
> Looking at the 4th graph down, which is the cache statistics on the old
> version recursor, you will see that around 90% of all questions are cache
> hits, with around 10% as cache misses. And, looking at the third graph
> (showing how fast queries are answered), you'll see that over 90% of all
> queries are answered in less than 1 ms.
>
> However, looking at the bottom graph, which is the cache statistics on the
> new recursor, the statistics are totally different. Only 1.1% of the total
> questions are cache hits, while 6.8% are cache misses, which to me makes no
> sense, since a question *HAS* to be either a cache hit or cache miss. And,
> looking at the 7th graph (answer speed on the new recursor version), most
> queries are taking more than 10ms to answer.
>
> Just as additional info, the data collected by cacti to generate these
> graphs comes from the following command:
>
> /usr/bin/rec_control get questions cache-entries cache-hits cache-misses
> concurrent-queries resource-limits unauthorized-tcp unauthorized-udp
> spoof-prevents answers-slow client-parse-errors answers0-1 answers1-10
> answers10-100 answers100-1000 qa-latency
>
> Am I mis-interpreting this, or is there something definately going on?
>
> Thanks for your time,
>
> Jeremy
Hi Jeremy,
You are not including the statistics for packetcache-hits/misses. If
it hits their it will not check the cache. I would bet that your
packetcache-hits are pretty substantial. Ours are almost 3X the
cache-hits.
Cheers,
Ken
More information about the Pdns-users
mailing list