[Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator thread died with STL error: stou
Oliver Peter
lists at peter.de.com
Thu Sep 15 07:48:12 UTC 2016
Hi Pieter,
Thanks for your reply.
On Thu, Sep 15, 2016 at 09:12:43AM +0200, Pieter Lexis wrote:
> On Thu, 15 Sep 2016 09:05:31 +0200
> Oliver Peter <lists at peter.de.com> wrote:
> > During the update process from our 3.4.8 servers to 4.0.1 we encountered
> > a dying/looping pdns instance. 3.4.x has been stable for the last
> > ~6months.
> > We already moved 4 of our 5 auth NS to 4.0.1, all of them are running
> > FreeBSD10, all of them are working fine as expected.
> >
> > Today we upgraded our last instance and this one showed us a strange
> > error (Murphy's law) so we had to downgrade to 3.4.8. The service comes
> > up OK, servers a couple of requests, dies, and comes up again, etc:
> >
> >
> > Basically the machines are running almost the same config (except IP
> > settings of course) and serving almost the same zone database (~2mio
> > domains, ~20mio records).
> >
> > On the same machine we have another pdns instance running, same
> > binaries, a bit less zones/records, different config profile - this one
> > was pretty stable.
> >
> > Any hints appreciated.
>
> We became a little more strict on database content in 4.0.0. I would suggest running `pdnsutil check-all-zones` to see which record causes the issue. Could you then send us that record in a github issue[1], because crashing on something like this is bad.
I tried that:
[root at a.ns14.net:~]# pdnsutil check-all-zones
Error: stou
[root at a.ns14.net:~]# pdnsutil -v check-all-zones
Error: stou
truss gives me nothing helpful at the moment:
[...]
munmap(0x803400000,4194304) = 0 (0x0)
poll({5/POLLIN|POLLPRI},1,0) = 0 (0x0)
write(5,"\^E\0\0\0\^Y\^A\0\0\0",9) = 9 (0x9)
write(5,"\^A\0\0\0\^A",5) = 5 (0x5)
shutdown(5,SHUT_RDWR) = 0 (0x0)
close(5) = 0 (0x0)
madvise(0x8097f0000,0x10000,0x5,0xaaaaaaaaaaaaaaab,0x809405e20,0x801f0ee80) = 0 (0x0)
munmap(0x814000000,4194304) = 0 (0x0)
madvise(0x8057fc000,0x1000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
munmap(0x809400000,4194304) = 0 (0x0)
madvise(0x8024f4000,0x3000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x8024fa000,0x8000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x802503000,0x2000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x802528000,0x2000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
write(4,"\^A\0\0\0\^A",5) = 5 (0x5)
shutdown(4,SHUT_RDWR) = 0 (0x0)
close(4) = 0 (0x0)
Error: write(2,"Error: ",7) = 7 (0x7)
stouwrite(2,"stou",4) = 4 (0x4)
write(2,"\n",1) = 1 (0x1)
[...]
Is it possible to add more debug flags/output to the program?
Once we found the corrupt zone(s) I will file in a bug at github.
--
Oliver PETER oliver at gfuzz.de 0x456D688F
More information about the Pdns-users
mailing list