[Pdns-users] Updated rrd scripts for pdns_recursor with support for --fork

Daniel Selans dan.s at hostdime.com
Mon Sep 3 12:36:48 UTC 2007


Hello list,

After successfully deploying the pdns_recursor for a dns cluster, I  
wanted to get some performance statistics per node. Now,  all of the  
nodes in the cluster run the recursor with the --fork option, as all  
of them are multiprocessor machines.

Shortly after looking through the scripts in the 'rrd' directory that  
is provided with the pdns_recursor source, it was apparent that the  
scripts didn't support the creating of graphs if the recursor was ran  
with the --fork option, as the control sockets now establish  
different filenames in the form of 'pdns_recursor.controlsocket.$PID'.

For this reason, I've modified all the existing scripts, to support  
multiple control sockets, and thus graphing of stats for both of the  
pdns_recursor processes.

I apologize for the length of this email, as the content is a bit  
long. If you do not wish to copy/paste, feel free to download the  
updated 'rrd' directory from http://www.skey.org/rrd-fork.tar

Many thanks to Ahu for writing up the original scripts :-)

Following are the contents of the updated 'rrd' dirs' scripts:

------------------
'create' script:
------------------
#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#

UPDATE_INTERVAL=60
CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`

# Quick sanity check
if [ CHKFORK = 0 ]
then
         echo "pdns_recursor is not running!"
         exit
fi

COUNT=0

until [ $CHKFORK = $COUNT ]
do
         if [ $CHKFORK = 2 ]
         then
                 RRDFILEEND="$COUNT.rrd"
         else
                 RRDFILEEND="rrd"
         fi

         echo "rrdtool create pdns_recursor.$RRDFILEEND"
         rrdtool create pdns_recursor.$RRDFILEEND -s $UPDATE_INTERVAL \
         DS:questions:COUNTER:600:0:100000 \
         DS:tcp-questions:COUNTER:600:0:100000 \
         DS:cache-entries:GAUGE:600:0:U \
         DS:throttle-entries:GAUGE:600:0:U \
         DS:concurrent-queries:GAUGE:600:0:50000 \
         DS:noerror-answers:COUNTER:600:0:100000 \
         DS:nxdomain-answers:COUNTER:600:0:100000 \
         DS:servfail-answers:COUNTER:600:0:100000 \
         DS:tcp-outqueries:COUNTER:600:0:100000 \
         DS:outgoing-timeouts:COUNTER:600:0:100000 \
         DS:throttled-out:COUNTER:600:0:100000 \
         DS:nsspeeds-entries:GAUGE:600:0:U \
         DS:negcache-entries:GAUGE:600:0:U \
         DS:all-outqueries:COUNTER:600:0:100000 \
         DS:cache-hits:COUNTER:600:0:100000 \
         DS:cache-misses:COUNTER:600:0:100000 \
         DS:answers0-1:COUNTER:600:0:100000 \
         DS:answers1-10:COUNTER:600:0:100000 \
         DS:answers10-100:COUNTER:600:0:100000 \
         DS:answers100-1000:COUNTER:600:0:100000 \
         DS:answers-slow:COUNTER:600:0:100000 \
         DS:qa-latency:GAUGE:600:0:10000000 \
         DS:user-msec:COUNTER:600:0:2000 \
         DS:client-parse-errors:COUNTER:600:0:1000000 \
         DS:server-parse-errors:COUNTER:600:0:1000000 \
         DS:unauthorized-udp:COUNTER:600:0:1000000 \
         DS:unauthorized-tcp:COUNTER:600:0:1000000 \
         DS:sys-msec:COUNTER:600:0:2000 \
         RRA:AVERAGE:0.5:1:9600  \
         RRA:AVERAGE:0.5:4:9600  \
         RRA:AVERAGE:0.5:24:6000 \
         RRA:MAX:0.5:1:9600  \
         RRA:MAX:0.5:4:9600      \
         RRA:MAX:0.5:24:6000
         COUNT=$(($COUNT + 1))
done

-------------------
'update' script:
-------------------

#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#

SOCKETDIR=/var/run # Leave off the trailing slash
CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`

# Quick sanity check
if [ CHKFORK = 0 ]
then
         echo "pdns_recursor is not running!"
         exit
fi

TSTAMP=$(date +%s)

VARIABLES="questions tcp-questions cache-entries concurrent-queries\
            nxdomain-answers noerror-answers\
            servfail-answers tcp-outqueries\
            outgoing-timeouts nsspeeds-entries negcache-entries all- 
outqueries throttled-out\
            cache-hits cache-misses answers0-1 answers1-10  
answers10-100 answers100-1000 answers-slow\
            qa-latency throttle-entries sys-msec user-msec  
unauthorized-udp unauthorized-tcp client-parse-errors\
            server-parse-errors"

UVARIABLES=$(echo $VARIABLES | tr '[a-z]' '[A-Z]' | tr - _ )



if [ $CHKFORK = 2 ]
then
         # Running with --fork
         SOCKETS=`ls -t -1 $SOCKETDIR/pdns_recursor.controlsocket.* |  
head --lines 2`
         FORK="yes"
         COUNT=0

         for i in $SOCKETS
         do
                 PID=`echo $i | cut -d . -f3`
                 rec_control --socket-dir=$SOCKETDIR --socket-pid= 
$PID GET $VARIABLES |
                 (
                 for a in $UVARIABLES
                 do
                         read $a
                 done
                 rrdtool update pdns_recursor.$COUNT.rrd \
                         -t \
                         $(for a in $VARIABLES
                         do
                                 echo -n $a:
                         done | sed 's/:$//' ) \
                 $TSTAMP$(
                         for a in $UVARIABLES
                         do
                                 echo -n :${!a}
                         done)
                 )
                 COUNT=$(($COUNT + 1))
         done

else
         rec_control --socket-dir=$SOCKETDIR  GET $VARIABLES |
         (
         for a in $UVARIABLES
         do
                 read $a
         done
         rrdtool update pdns_recursor.rrd \
                 -t \
                 $(for a in $VARIABLES
                 do
                         echo -n $a:
                 done | sed 's/:$//' ) \
         $TSTAMP$(
                 for a in $UVARIABLES
                 do
                         echo -n :${!a}
                 done)
         )
fi

--------------------------
'makegraphs' script:
--------------------------

#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#

WWWPREFIX=.
WSIZE=800
HSIZE=250

# only recent rrds offer slope-mode:
GRAPHOPTS=--slope-mode

function makeGraphs()
{

   if [ $MULTI = 1 ]
   then
         IMGFILEEND="$3.png"
         RRDFILEEND="$3.rrd"
   else
         IMGFILEEND="png"
         RRDFILEEND="rrd"
   fi

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/questions-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Questions and answers per second" \
         -v "packets" \
         DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE  \
         DEF:nxdomainanswers=pdns_recursor.$RRDFILEEND:nxdomain- 
answers:AVERAGE \
         DEF:noerroranswers=pdns_recursor.$RRDFILEEND:noerror- 
answers:AVERAGE \
         DEF:servfailanswers=pdns_recursor.$RRDFILEEND:servfail- 
answers:AVERAGE \
         LINE1:questions#0000ff:"questions/s"\
         AREA:noerroranswers#00ff00:"noerror answers/s"  \
         STACK:nxdomainanswers#ffa500:"nxdomain answers/s"\
         STACK:servfailanswers#ff0000:"servfail answers/s"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/tcp-questions-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "TCP questions and answers per second, unauthorized  
packets/s" \
         -v "packets" \
         DEF:tcpquestions=pdns_recursor.$RRDFILEEND:tcp- 
questions:AVERAGE  \
         DEF:unauthudp=pdns_recursor.$RRDFILEEND:unauthorized- 
udp:AVERAGE  \
         DEF:unauthtcp=pdns_recursor.$RRDFILEEND:unauthorized- 
tcp:AVERAGE  \
         LINE1:tcpquestions#0000ff:"tcp questions/s" \
         LINE1:unauthudp#ff0000:"udp unauth/s"  \
         LINE1:unauthtcp#00ff00:"tcp unauth/s"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/packet-errors-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Parsing errors per second" \
         -v "packets" \
         DEF:clientparseerrors=pdns_recursor.$RRDFILEEND:client-parse- 
errors:AVERAGE  \
         DEF:serverparseerrors=pdns_recursor.$RRDFILEEND:server-parse- 
errors:AVERAGE  \
         LINE1:clientparseerrors#0000ff:"bad packets from clients" \
         LINE1:serverparseerrors#00ff00:"bad packets from servers"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/latencies-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Questions answered within latency" \
         -v "questions" \
         DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE  \
         DEF:answers01=pdns_recursor.$RRDFILEEND:answers0-1:AVERAGE \
         DEF:answers110=pdns_recursor.$RRDFILEEND:answers1-10:AVERAGE \
         DEF:answers10100=pdns_recursor. 
$RRDFILEEND:answers10-100:AVERAGE \
         DEF:answers1001000=pdns_recursor. 
$RRDFILEEND:answers100-1000:AVERAGE \
         DEF:answersslow=pdns_recursor.$RRDFILEEND:answers- 
slow:AVERAGE \
         LINE1:questions#0000ff:"questions/s" \
         AREA:answers01#00ff00:"<1 ms" \
         STACK:answers110#0000ff:"<10 ms" \
         STACK:answers10100#00ffff:"<100 ms" \
         STACK:answers1001000#ffff00:"<1000 ms" \
         STACK:answersslow#ff0000:">1000 ms"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/qoutq-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0 \
         -t "Questions/outqueries per second" \
         -v "packets" \
         DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE  \
         DEF:alloutqueries=pdns_recursor.$RRDFILEEND:all- 
outqueries:AVERAGE \
         LINE1:questions#ff0000:"questions/s"\
         LINE1:alloutqueries#00ff00:"outqueries/s"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/qa-latency-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0 \
         -t "Questions/answer latency in milliseconds" \
         -v "msec" \
         DEF:qalatency=pdns_recursor.$RRDFILEEND:qa-latency:AVERAGE  \
         CDEF:mqalatency=qalatency,1000,/ \
         LINE1:mqalatency#ff0000:"questions/s"


   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/timeouts-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Outqueries/timeouts per second" \
         -v "events" \
         DEF:alloutqueries=pdns_recursor.$RRDFILEEND:all- 
outqueries:AVERAGE  \
         DEF:outgoingtimeouts=pdns_recursor.$RRDFILEEND:outgoing- 
timeouts:AVERAGE \
         DEF:throttledout=pdns_recursor.$RRDFILEEND:throttled- 
out:AVERAGE \
         LINE1:alloutqueries#ff0000:"outqueries/s"\
         LINE1:outgoingtimeouts#00ff00:"outgoing timeouts/s"\
         LINE1:throttledout#0000ff:"throttled outqueries/s"


   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/caches-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Cache sizes" \
         -v "entries" \
         DEF:cacheentries=pdns_recursor.$RRDFILEEND:cache- 
entries:AVERAGE  \
         DEF:negcacheentries=pdns_recursor.$RRDFILEEND:negcache- 
entries:AVERAGE  \
         DEF:nsspeedsentries=pdns_recursor.$RRDFILEEND:nsspeeds- 
entries:AVERAGE  \
         DEF:throttleentries=pdns_recursor.$RRDFILEEND:throttle- 
entries:AVERAGE  \
         LINE1:cacheentries#ff0000:"cache entries" \
         LINE1:negcacheentries#0000ff:"negative cache entries" \
         LINE1:nsspeedsentries#00ff00:"NS speeds entries" \
         LINE1:throttleentries#00ff00:"throttle map entries"


   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/caches2-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -t "Cache sizes" \
         -v "entries" \
         DEF:negcacheentries=pdns_recursor.$RRDFILEEND:negcache- 
entries:AVERAGE  \
         DEF:nsspeedsentries=pdns_recursor.$RRDFILEEND:nsspeeds- 
entries:AVERAGE  \
         DEF:throttleentries=pdns_recursor.$RRDFILEEND:throttle- 
entries:AVERAGE  \
         LINE1:negcacheentries#0000ff:"negative cache entries" \
         LINE1:nsspeedsentries#00ff00:"NS speeds entries" \
         LINE1:throttleentries#ffa000:"throttle map entries"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/load-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -v "MThreads" \
         -t "Concurrent queries" \
         DEF:concurrentqueries=pdns_recursor.$RRDFILEEND:concurrent- 
queries:AVERAGE  \
         LINE1:concurrentqueries#0000ff:"concurrent queries"

   rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/hitrate-$2. 
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
         -v "percentage" \
         -t "cache hits, cpu load" \
         DEF:cachehits=pdns_recursor.$RRDFILEEND:cache-hits:AVERAGE  \
         DEF:cachemisses=pdns_recursor.$RRDFILEEND:cache- 
misses:AVERAGE  \
         DEF:usermsec=pdns_recursor.$RRDFILEEND:user-msec:AVERAGE \
         DEF:sysmsec=pdns_recursor.$RRDFILEEND:sys-msec:AVERAGE \
         DEF:musermsec=pdns_recursor.$RRDFILEEND:user-msec:MAX \
         DEF:msysmsec=pdns_recursor.$RRDFILEEND:sys-msec:MAX \
         CDEF:perc=cachehits,100,*,cachehits,cachemisses,+,/ \
         CDEF:userperc=usermsec,10,/ \
         CDEF:sysperc=sysmsec,10,/ \
         CDEF:totmperc=musermsec,msysmsec,+,10,/ \
         LINE1:perc#0000ff:"percentage cache hits"  \
         LINE1:totmperc#ffff00:"max cpu use" \
         AREA:userperc#ff0000:"user cpu percentage" \
         STACK:sysperc#00ff00:"system cpu percentage" \
         COMMENT:"\l" \
         COMMENT:"Cache hits " \
         GPRINT:perc:AVERAGE:"avg %-3.1lf%%\t" \
         GPRINT:perc:LAST:"last %-3.1lf%%\t" \
         GPRINT:perc:MAX:"max %-3.1lf%%" \
         COMMENT:"\l" \
         COMMENT:"System cpu " \
         GPRINT:sysperc:AVERAGE:"avg %-3.1lf%%\t" \
         GPRINT:sysperc:LAST:"last %-3.1lf%%\t" \
         GPRINT:sysperc:MAX:"max %-3.1lf%%\t" \
         COMMENT:"\l" \
         COMMENT:"User cpu   " \
         GPRINT:userperc:AVERAGE:"avg %-3.1lf%%\t" \
         GPRINT:userperc:LAST:"last %-3.1lf%%\t" \
         GPRINT:userperc:MAX:"max %-3.1lf%%"

}

CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`

# Quick sanity check
if [ CHKFORK = 0 ]
then
         echo "pdns_recursor is not running!"
         exit
fi

if [ $CHKFORK = 2 ]
then
         COUNT=0
         MULTI=1
         while [ $COUNT != $CHKFORK ]
         do
                 makeGraphs 6h 6h $COUNT
                 makeGraphs 24h day $COUNT
                 #makeGraphs 7d week $COUNT
                 #makeGraphs 1m month $COUNT
                 #makeGraphs 1y year $COUNT
                 COUNT=$(($COUNT + 1))
         done
else
         MULTI=0
         makeGraphs 6h 6h
         makeGraphs 24h day
         #makeGraphs 7d week
         #makeGraphs 1m month
         #makeGraphs 1y year
fi

I've created an additional 'README' with a set of basic instructions,  
and some additional info.

-------------
README
-------------
A quick (and a bit dirty) way to graph your recursors stats.
------------------------------------------------------------

This is the updated set of scripts for creating rrdtool based
graphs, with support for pdns_recursor running with --fork.


Instructions:

1. Make sure rrdtool is installed.
2. Run the recursor.
3. Create the rrd file templates by running './create'

Note1: If your control socket(s) is located in somewhere other than
       /var/run, edit the 'update' script and change the SOCKETDIR
       variable to the correct path.

Note2: By default, the graph images are going to be created in the
        same directory where the scripts are called from. If you wish
        to change this behaviour, edit the 'makegraphs' script and
        change the WWWPREFIX variable to point to your web path.

4. 'crontab -e' to run 'update' and 'makegraphs' every X minutes.

Note3: Similarly, you can add another cronjob to export all the .png's
        and html files to to a remote server for viewing.

5. If you are running the recursor with --fork, use the index.multi.html
    for displaying graphs, otherwise use index.html.

All is set, you should now have fully working graphs.


Possible problems:

The scripts are likely to work only on Linux, due to the use of a
couple of options that we pass to 'ps' (-C and --no-headers), that are
typically native to Linux. You can of course adapt this by changing
the CHKFORK variable in all the scripts to mimic the behaviour of
Linux's 'ps'.

'-C' selects processes by process name.
'--no-headers' prevents 'ps' from printing the header.


--
Daniel Selans
<dan.s at hostdime.com>

Again, apologies for the length of this email, but hope you guys can  
make use of this :-)


Daniel Selans
Sr. Systems Administrator/Network Engineer
Hostdime.com, Inc. | http://www.hostdime.com/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20070903/369a74d3/attachment.html>


More information about the Pdns-users mailing list