[Pdns-users] Updated rrd scripts for pdns_recursor with support for --fork
Daniel Selans
dan.s at hostdime.com
Mon Sep 3 12:36:48 UTC 2007
Hello list,
After successfully deploying the pdns_recursor for a dns cluster, I
wanted to get some performance statistics per node. Now, all of the
nodes in the cluster run the recursor with the --fork option, as all
of them are multiprocessor machines.
Shortly after looking through the scripts in the 'rrd' directory that
is provided with the pdns_recursor source, it was apparent that the
scripts didn't support the creating of graphs if the recursor was ran
with the --fork option, as the control sockets now establish
different filenames in the form of 'pdns_recursor.controlsocket.$PID'.
For this reason, I've modified all the existing scripts, to support
multiple control sockets, and thus graphing of stats for both of the
pdns_recursor processes.
I apologize for the length of this email, as the content is a bit
long. If you do not wish to copy/paste, feel free to download the
updated 'rrd' directory from http://www.skey.org/rrd-fork.tar
Many thanks to Ahu for writing up the original scripts :-)
Following are the contents of the updated 'rrd' dirs' scripts:
------------------
'create' script:
------------------
#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#
UPDATE_INTERVAL=60
CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`
# Quick sanity check
if [ CHKFORK = 0 ]
then
echo "pdns_recursor is not running!"
exit
fi
COUNT=0
until [ $CHKFORK = $COUNT ]
do
if [ $CHKFORK = 2 ]
then
RRDFILEEND="$COUNT.rrd"
else
RRDFILEEND="rrd"
fi
echo "rrdtool create pdns_recursor.$RRDFILEEND"
rrdtool create pdns_recursor.$RRDFILEEND -s $UPDATE_INTERVAL \
DS:questions:COUNTER:600:0:100000 \
DS:tcp-questions:COUNTER:600:0:100000 \
DS:cache-entries:GAUGE:600:0:U \
DS:throttle-entries:GAUGE:600:0:U \
DS:concurrent-queries:GAUGE:600:0:50000 \
DS:noerror-answers:COUNTER:600:0:100000 \
DS:nxdomain-answers:COUNTER:600:0:100000 \
DS:servfail-answers:COUNTER:600:0:100000 \
DS:tcp-outqueries:COUNTER:600:0:100000 \
DS:outgoing-timeouts:COUNTER:600:0:100000 \
DS:throttled-out:COUNTER:600:0:100000 \
DS:nsspeeds-entries:GAUGE:600:0:U \
DS:negcache-entries:GAUGE:600:0:U \
DS:all-outqueries:COUNTER:600:0:100000 \
DS:cache-hits:COUNTER:600:0:100000 \
DS:cache-misses:COUNTER:600:0:100000 \
DS:answers0-1:COUNTER:600:0:100000 \
DS:answers1-10:COUNTER:600:0:100000 \
DS:answers10-100:COUNTER:600:0:100000 \
DS:answers100-1000:COUNTER:600:0:100000 \
DS:answers-slow:COUNTER:600:0:100000 \
DS:qa-latency:GAUGE:600:0:10000000 \
DS:user-msec:COUNTER:600:0:2000 \
DS:client-parse-errors:COUNTER:600:0:1000000 \
DS:server-parse-errors:COUNTER:600:0:1000000 \
DS:unauthorized-udp:COUNTER:600:0:1000000 \
DS:unauthorized-tcp:COUNTER:600:0:1000000 \
DS:sys-msec:COUNTER:600:0:2000 \
RRA:AVERAGE:0.5:1:9600 \
RRA:AVERAGE:0.5:4:9600 \
RRA:AVERAGE:0.5:24:6000 \
RRA:MAX:0.5:1:9600 \
RRA:MAX:0.5:4:9600 \
RRA:MAX:0.5:24:6000
COUNT=$(($COUNT + 1))
done
-------------------
'update' script:
-------------------
#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#
SOCKETDIR=/var/run # Leave off the trailing slash
CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`
# Quick sanity check
if [ CHKFORK = 0 ]
then
echo "pdns_recursor is not running!"
exit
fi
TSTAMP=$(date +%s)
VARIABLES="questions tcp-questions cache-entries concurrent-queries\
nxdomain-answers noerror-answers\
servfail-answers tcp-outqueries\
outgoing-timeouts nsspeeds-entries negcache-entries all-
outqueries throttled-out\
cache-hits cache-misses answers0-1 answers1-10
answers10-100 answers100-1000 answers-slow\
qa-latency throttle-entries sys-msec user-msec
unauthorized-udp unauthorized-tcp client-parse-errors\
server-parse-errors"
UVARIABLES=$(echo $VARIABLES | tr '[a-z]' '[A-Z]' | tr - _ )
if [ $CHKFORK = 2 ]
then
# Running with --fork
SOCKETS=`ls -t -1 $SOCKETDIR/pdns_recursor.controlsocket.* |
head --lines 2`
FORK="yes"
COUNT=0
for i in $SOCKETS
do
PID=`echo $i | cut -d . -f3`
rec_control --socket-dir=$SOCKETDIR --socket-pid=
$PID GET $VARIABLES |
(
for a in $UVARIABLES
do
read $a
done
rrdtool update pdns_recursor.$COUNT.rrd \
-t \
$(for a in $VARIABLES
do
echo -n $a:
done | sed 's/:$//' ) \
$TSTAMP$(
for a in $UVARIABLES
do
echo -n :${!a}
done)
)
COUNT=$(($COUNT + 1))
done
else
rec_control --socket-dir=$SOCKETDIR GET $VARIABLES |
(
for a in $UVARIABLES
do
read $a
done
rrdtool update pdns_recursor.rrd \
-t \
$(for a in $VARIABLES
do
echo -n $a:
done | sed 's/:$//' ) \
$TSTAMP$(
for a in $UVARIABLES
do
echo -n :${!a}
done)
)
fi
--------------------------
'makegraphs' script:
--------------------------
#!/bin/bash
#
# Added --fork compatibility to support multiple control sockets
#
# Original by ahu @ pdns-dev team
# Modified by Daniel Selans <dan.s at hostdime.com>
# 09.02.07
#
WWWPREFIX=.
WSIZE=800
HSIZE=250
# only recent rrds offer slope-mode:
GRAPHOPTS=--slope-mode
function makeGraphs()
{
if [ $MULTI = 1 ]
then
IMGFILEEND="$3.png"
RRDFILEEND="$3.rrd"
else
IMGFILEEND="png"
RRDFILEEND="rrd"
fi
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/questions-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Questions and answers per second" \
-v "packets" \
DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE \
DEF:nxdomainanswers=pdns_recursor.$RRDFILEEND:nxdomain-
answers:AVERAGE \
DEF:noerroranswers=pdns_recursor.$RRDFILEEND:noerror-
answers:AVERAGE \
DEF:servfailanswers=pdns_recursor.$RRDFILEEND:servfail-
answers:AVERAGE \
LINE1:questions#0000ff:"questions/s"\
AREA:noerroranswers#00ff00:"noerror answers/s" \
STACK:nxdomainanswers#ffa500:"nxdomain answers/s"\
STACK:servfailanswers#ff0000:"servfail answers/s"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/tcp-questions-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "TCP questions and answers per second, unauthorized
packets/s" \
-v "packets" \
DEF:tcpquestions=pdns_recursor.$RRDFILEEND:tcp-
questions:AVERAGE \
DEF:unauthudp=pdns_recursor.$RRDFILEEND:unauthorized-
udp:AVERAGE \
DEF:unauthtcp=pdns_recursor.$RRDFILEEND:unauthorized-
tcp:AVERAGE \
LINE1:tcpquestions#0000ff:"tcp questions/s" \
LINE1:unauthudp#ff0000:"udp unauth/s" \
LINE1:unauthtcp#00ff00:"tcp unauth/s"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/packet-errors-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Parsing errors per second" \
-v "packets" \
DEF:clientparseerrors=pdns_recursor.$RRDFILEEND:client-parse-
errors:AVERAGE \
DEF:serverparseerrors=pdns_recursor.$RRDFILEEND:server-parse-
errors:AVERAGE \
LINE1:clientparseerrors#0000ff:"bad packets from clients" \
LINE1:serverparseerrors#00ff00:"bad packets from servers"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/latencies-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Questions answered within latency" \
-v "questions" \
DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE \
DEF:answers01=pdns_recursor.$RRDFILEEND:answers0-1:AVERAGE \
DEF:answers110=pdns_recursor.$RRDFILEEND:answers1-10:AVERAGE \
DEF:answers10100=pdns_recursor.
$RRDFILEEND:answers10-100:AVERAGE \
DEF:answers1001000=pdns_recursor.
$RRDFILEEND:answers100-1000:AVERAGE \
DEF:answersslow=pdns_recursor.$RRDFILEEND:answers-
slow:AVERAGE \
LINE1:questions#0000ff:"questions/s" \
AREA:answers01#00ff00:"<1 ms" \
STACK:answers110#0000ff:"<10 ms" \
STACK:answers10100#00ffff:"<100 ms" \
STACK:answers1001000#ffff00:"<1000 ms" \
STACK:answersslow#ff0000:">1000 ms"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/qoutq-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0 \
-t "Questions/outqueries per second" \
-v "packets" \
DEF:questions=pdns_recursor.$RRDFILEEND:questions:AVERAGE \
DEF:alloutqueries=pdns_recursor.$RRDFILEEND:all-
outqueries:AVERAGE \
LINE1:questions#ff0000:"questions/s"\
LINE1:alloutqueries#00ff00:"outqueries/s"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/qa-latency-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0 \
-t "Questions/answer latency in milliseconds" \
-v "msec" \
DEF:qalatency=pdns_recursor.$RRDFILEEND:qa-latency:AVERAGE \
CDEF:mqalatency=qalatency,1000,/ \
LINE1:mqalatency#ff0000:"questions/s"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/timeouts-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Outqueries/timeouts per second" \
-v "events" \
DEF:alloutqueries=pdns_recursor.$RRDFILEEND:all-
outqueries:AVERAGE \
DEF:outgoingtimeouts=pdns_recursor.$RRDFILEEND:outgoing-
timeouts:AVERAGE \
DEF:throttledout=pdns_recursor.$RRDFILEEND:throttled-
out:AVERAGE \
LINE1:alloutqueries#ff0000:"outqueries/s"\
LINE1:outgoingtimeouts#00ff00:"outgoing timeouts/s"\
LINE1:throttledout#0000ff:"throttled outqueries/s"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/caches-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Cache sizes" \
-v "entries" \
DEF:cacheentries=pdns_recursor.$RRDFILEEND:cache-
entries:AVERAGE \
DEF:negcacheentries=pdns_recursor.$RRDFILEEND:negcache-
entries:AVERAGE \
DEF:nsspeedsentries=pdns_recursor.$RRDFILEEND:nsspeeds-
entries:AVERAGE \
DEF:throttleentries=pdns_recursor.$RRDFILEEND:throttle-
entries:AVERAGE \
LINE1:cacheentries#ff0000:"cache entries" \
LINE1:negcacheentries#0000ff:"negative cache entries" \
LINE1:nsspeedsentries#00ff00:"NS speeds entries" \
LINE1:throttleentries#00ff00:"throttle map entries"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/caches2-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-t "Cache sizes" \
-v "entries" \
DEF:negcacheentries=pdns_recursor.$RRDFILEEND:negcache-
entries:AVERAGE \
DEF:nsspeedsentries=pdns_recursor.$RRDFILEEND:nsspeeds-
entries:AVERAGE \
DEF:throttleentries=pdns_recursor.$RRDFILEEND:throttle-
entries:AVERAGE \
LINE1:negcacheentries#0000ff:"negative cache entries" \
LINE1:nsspeedsentries#00ff00:"NS speeds entries" \
LINE1:throttleentries#ffa000:"throttle map entries"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/load-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-v "MThreads" \
-t "Concurrent queries" \
DEF:concurrentqueries=pdns_recursor.$RRDFILEEND:concurrent-
queries:AVERAGE \
LINE1:concurrentqueries#0000ff:"concurrent queries"
rrdtool graph $GRAPHOPTS --start -$1 $WWWPREFIX/hitrate-$2.
$IMGFILEEND -w $WSIZE -h $HSIZE -l 0\
-v "percentage" \
-t "cache hits, cpu load" \
DEF:cachehits=pdns_recursor.$RRDFILEEND:cache-hits:AVERAGE \
DEF:cachemisses=pdns_recursor.$RRDFILEEND:cache-
misses:AVERAGE \
DEF:usermsec=pdns_recursor.$RRDFILEEND:user-msec:AVERAGE \
DEF:sysmsec=pdns_recursor.$RRDFILEEND:sys-msec:AVERAGE \
DEF:musermsec=pdns_recursor.$RRDFILEEND:user-msec:MAX \
DEF:msysmsec=pdns_recursor.$RRDFILEEND:sys-msec:MAX \
CDEF:perc=cachehits,100,*,cachehits,cachemisses,+,/ \
CDEF:userperc=usermsec,10,/ \
CDEF:sysperc=sysmsec,10,/ \
CDEF:totmperc=musermsec,msysmsec,+,10,/ \
LINE1:perc#0000ff:"percentage cache hits" \
LINE1:totmperc#ffff00:"max cpu use" \
AREA:userperc#ff0000:"user cpu percentage" \
STACK:sysperc#00ff00:"system cpu percentage" \
COMMENT:"\l" \
COMMENT:"Cache hits " \
GPRINT:perc:AVERAGE:"avg %-3.1lf%%\t" \
GPRINT:perc:LAST:"last %-3.1lf%%\t" \
GPRINT:perc:MAX:"max %-3.1lf%%" \
COMMENT:"\l" \
COMMENT:"System cpu " \
GPRINT:sysperc:AVERAGE:"avg %-3.1lf%%\t" \
GPRINT:sysperc:LAST:"last %-3.1lf%%\t" \
GPRINT:sysperc:MAX:"max %-3.1lf%%\t" \
COMMENT:"\l" \
COMMENT:"User cpu " \
GPRINT:userperc:AVERAGE:"avg %-3.1lf%%\t" \
GPRINT:userperc:LAST:"last %-3.1lf%%\t" \
GPRINT:userperc:MAX:"max %-3.1lf%%"
}
CHKFORK=`ps -C pdns_recursor --no-headers | wc -l`
# Quick sanity check
if [ CHKFORK = 0 ]
then
echo "pdns_recursor is not running!"
exit
fi
if [ $CHKFORK = 2 ]
then
COUNT=0
MULTI=1
while [ $COUNT != $CHKFORK ]
do
makeGraphs 6h 6h $COUNT
makeGraphs 24h day $COUNT
#makeGraphs 7d week $COUNT
#makeGraphs 1m month $COUNT
#makeGraphs 1y year $COUNT
COUNT=$(($COUNT + 1))
done
else
MULTI=0
makeGraphs 6h 6h
makeGraphs 24h day
#makeGraphs 7d week
#makeGraphs 1m month
#makeGraphs 1y year
fi
I've created an additional 'README' with a set of basic instructions,
and some additional info.
-------------
README
-------------
A quick (and a bit dirty) way to graph your recursors stats.
------------------------------------------------------------
This is the updated set of scripts for creating rrdtool based
graphs, with support for pdns_recursor running with --fork.
Instructions:
1. Make sure rrdtool is installed.
2. Run the recursor.
3. Create the rrd file templates by running './create'
Note1: If your control socket(s) is located in somewhere other than
/var/run, edit the 'update' script and change the SOCKETDIR
variable to the correct path.
Note2: By default, the graph images are going to be created in the
same directory where the scripts are called from. If you wish
to change this behaviour, edit the 'makegraphs' script and
change the WWWPREFIX variable to point to your web path.
4. 'crontab -e' to run 'update' and 'makegraphs' every X minutes.
Note3: Similarly, you can add another cronjob to export all the .png's
and html files to to a remote server for viewing.
5. If you are running the recursor with --fork, use the index.multi.html
for displaying graphs, otherwise use index.html.
All is set, you should now have fully working graphs.
Possible problems:
The scripts are likely to work only on Linux, due to the use of a
couple of options that we pass to 'ps' (-C and --no-headers), that are
typically native to Linux. You can of course adapt this by changing
the CHKFORK variable in all the scripts to mimic the behaviour of
Linux's 'ps'.
'-C' selects processes by process name.
'--no-headers' prevents 'ps' from printing the header.
--
Daniel Selans
<dan.s at hostdime.com>
Again, apologies for the length of this email, but hope you guys can
make use of this :-)
Daniel Selans
Sr. Systems Administrator/Network Engineer
Hostdime.com, Inc. | http://www.hostdime.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20070903/369a74d3/attachment.html>
More information about the Pdns-users
mailing list