[Pdns-users] TCP Queries stop - can only fix with restart?
Matt Gibson
m.gibson at voxip.ca
Tue Oct 18 04:55:20 UTC 2005
Greetings,
We have two servers with the identical configuration, both running
PowerDNS as an Authorative name server, fetching it's data from the
stock MySQL tables provided. Each nameserver box has it's own MySQL
server running locally which has data replicated to it from a Master
Mysql server elsewhere in the mix.
Each box is also acting as a recursive server using PowerDNS's internal
recursion server. Powerdns is listening on about 250 ip's on both boxes,
TCP and UDP queries.
Oct 17 22:51:36 ns1 pdns[5800]: UDP server bound to xxx.xxx.xxx.xxx:53
<250 some ip's>
Oct 17 22:51:35 ns1 pdns[5800]: UDP server bound to 127.0.0.1:53
Oct 17 22:51:36 ns1 pdns[5800]: TCP server bound to xxx.xxx.xxx.xxx:53
<250 some ip's>
Oct 17 22:51:36 ns1 pdns[5800]: TCP server bound to 127.0.0.1:53
Oct 17 22:51:36 ns1 pdns[5800]: Set effective group id to 407
Oct 17 22:51:36 ns1 pdns[5800]: Set effective user id to 1001
Oct 17 22:51:36 ns1 pdns[5800]: DNS Proxy launched, local port 33484,
remote 127.0.0.1:5300
Oct 17 22:51:36 ns1 pdns[5800]: Creating backend connection for TCP
Oct 17 22:51:36 ns1 pdns[5800]: Master/slave communicator launching
It all starts fine, but every couple of days TCP auth/recurse queries
seem to cease functioning, while UDP are still working fine with the
following error:
Oct 17 23:02:10 ns1 pdns[5800]: TCP nameserver had error, cycling
backend:EOF trying to get length of answer from remote TCP server
Oct 17 23:02:21 ns1 pdns[5800]: TCP server is without backend
connections, launching
At least I think that error has something to do with it.
It seems that I'm able to simply restart powerdns and the issue goes
away, but that can't be the proper solution for this.
At the time that the server died out, I ran netstat -an, and in
condensed form, this was the result.
- 1039 total tcp connections at the time
- 874 of them were close wait
- 165 of them were established
and this to contrast, is the only output netstat -an gives me when i run
it during the server "working properly"
tcp 34 0 127.0.0.1:5300 127.0.0.1:35734 CLOSE_WAIT
tcp 0 0 xxx.xxx.xxx.xxx:53 xxx.xxx.xxx.xxx:19730 ESTABLISHED
tcp 0 0 xxx.xxx.xxx.xxx:53 xxx.xxx.xxx.xxx:1943 TIME_WAIT
tcp 0 0 xxx.xxx.xxx.xxx:53 xxx.xxx.xxx.xxx:1194 ESTABLISHED
Has anyone encountered anything like this before? Anyone have any ideas
on how to fix it? My boss is going nuts and so am I trying to figure
this out! :)
Some Background Info:
=====================
OS: Gentoo 2005.1 (emerge --sync as of a few days ago)
KERNEL: 2.6.12-gentoo-r9
RAM: 1GB
CPU: Intel(R) Xeon(TM) CPU 3.06GHz
Using NPTL, but not NPTLONLY
vmstat output when functioning properly:
========================================
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 2436 139072 213296 271544 0 0 0 3 3 7 2
1 93 4
0 0 2436 139072 213296 271612 0 0 0 44 1106 498 2
1 98 1
0 0 2436 139072 213296 271612 0 0 0 52 1114 438 1
1 99 0
1 0 2436 139048 213296 271612 0 0 0 1956 1223 993 3
2 94 2
0 0 2436 139048 213296 271612 0 0 0 1712 1238 701 1
1 85 12
0 0 2436 139048 213296 271612 0 0 0 64 1189 693 2
1 97 0
0 1 2436 138676 213296 271612 0 0 0 3612 1309 842 6
3 82 8
0 0 2436 138676 213296 271680 0 0 0 68 1211 546 1
1 93 5
0 0 2436 138676 213296 271680 0 0 0 40 1180 500 1
1 97 0
0 0 2436 138692 213296 271748 0 0 0 44 1237 686 1
1 98 0
0 0 2436 138692 213296 271748 0 0 0 416 1177 546 1
1 97 1
0 0 2436 138692 213296 271748 0 0 0 40 1201 459 1
2 98 0
0 0 2436 138692 213296 271748 0 0 0 44 1225 862 2
2 97 0
0 0 2436 138708 213296 271748 0 0 0 48 1239 864 2
1 97 0
0 0 2436 138708 213296 271748 0 0 0 52 1168 737 1
1 97 0
2 0 2436 138708 213296 271748 0 0 0 48 1162 711 1
2 97 0
iptables config
===============
#!/bin/sh
iptables=/sbin/iptables
# Flush all tables
#
$iptables -t nat -F
$iptables -t mangle -F
$iptables -t filter -F
# Delete all user defined tables
#
$iptables -X
# Set default chain policies
#
$iptables -P INPUT DROP
$iptables -P FORWARD DROP
$iptables -P OUTPUT ACCEPT
# Allow all to/from loopback interface
#
$iptables -A INPUT -i lo -j ACCEPT
$iptables -A OUTPUT -o lo -j ACCEPT
# Allow all to/from internal network interface
#
$iptables -A INPUT -i eth1 -j ACCEPT
$iptables -A OUTPUT -o eth1 -j ACCEPT
# Allow all established and/or related connections
#
$iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
$iptables -A OUTPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow in
#
#$iptables -A INPUT -i eth2 -p tcp --dport 22 -m state --state NEW -j ACCEPT
$iptables -A INPUT -i eth2 -p udp --dport 53 -m state --state NEW -j ACCEPT
$iptables -A INPUT -i eth2 -p tcp --dport 53 -m state --state NEW -j ACCEPT
$iptables -A INPUT -i eth2 -p tcp --dport 873 -m state --state NEW -j ACCEPT
# Allow out
#
$iptables -A OUTPUT -o eth2 -p tcp --dport 22 -m state --state NEW -j ACCEPT
$iptables -A OUTPUT -o eth2 -p udp --dport 53 -m state --state NEW -j ACCEPT
$iptables -A OUTPUT -o eth2 -p tcp --dport 873 -m state --state NEW -j
ACCEPT
/etc/sysctl.cnf
===============
net.ipv4.tcp_keepalive_time = 120
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_no_metrics_save = 1
net.core.netdev_max_backlog = 2500
and ran this
ifconfig eth2 txqueuelen 1000
/etc/recursor.conf
==================
setuid=nobody
setgid=nobody
quiet=on
local-address=127.0.0.1
local-port=5300
max-tcp-clients=1024
/etc/pdns.conf
==============
cache-ttl=60
daemon=yes
distributor-threads=10
launch=gmysql
gmysql-host=localhost
gmysql-user=xxxx
gmysql-password=xxxxxxx
gmysql-dbname=xxxxxxx
local-address=<INTERNAL IP>,<250 some external ip's comma seperated>
log-dns-details=yes
log-failed-updates=yes
logfile=pdns.log
logging-facility=0
loglevel=3
master=yes
query-cache-ttl=60
query-logging=yes
recursor=127.0.0.1:5300
setgid=407
setuid=1001
webserver=no
Thanks,
Matt Gibson
More information about the Pdns-users
mailing list