[Pdns-users] 3.4.8 -> 4.0.1: Exiting because communicator thread died with STL error: stou

Oliver Peter lists at peter.de.com
Thu Sep 15 07:48:12 UTC 2016


Hi Pieter,

Thanks for your reply.

On Thu, Sep 15, 2016 at 09:12:43AM +0200, Pieter Lexis wrote:
> On Thu, 15 Sep 2016 09:05:31 +0200
> Oliver Peter <lists at peter.de.com> wrote:
> > During the update process from our 3.4.8 servers to 4.0.1 we encountered
> > a dying/looping pdns instance.  3.4.x has been stable for the last
> > ~6months.
> > We already moved 4 of our 5 auth NS to 4.0.1, all of them are running
> > FreeBSD10, all of them are working fine as expected.
> > 
> > Today we upgraded our last instance and this one showed us a strange
> > error (Murphy's law) so we had to downgrade to 3.4.8.  The service comes
> > up OK, servers a couple of requests, dies, and comes up again, etc:
> > 
> > 
> > Basically the machines are running almost the same config (except IP
> > settings of course) and serving almost the same zone database (~2mio
> > domains, ~20mio records).
> > 
> > On the same machine we have another pdns instance running, same
> > binaries, a bit less zones/records, different config profile - this one
> > was pretty stable.
> > 
> > Any hints appreciated.
> 
> We became a little more strict on database content in 4.0.0. I would suggest running `pdnsutil check-all-zones` to see which record causes the issue. Could you then send us that record in a github issue[1], because crashing on something like this is bad.

I tried that:
	[root at a.ns14.net:~]# pdnsutil check-all-zones
	Error: stou
	[root at a.ns14.net:~]# pdnsutil -v check-all-zones                                                                                                                                              
	Error: stou

truss gives me nothing helpful at the moment:
[...]
munmap(0x803400000,4194304)                      = 0 (0x0)
poll({5/POLLIN|POLLPRI},1,0)                     = 0 (0x0)
write(5,"\^E\0\0\0\^Y\^A\0\0\0",9)               = 9 (0x9)
write(5,"\^A\0\0\0\^A",5)                        = 5 (0x5)
shutdown(5,SHUT_RDWR)                            = 0 (0x0)
close(5)                                         = 0 (0x0)
madvise(0x8097f0000,0x10000,0x5,0xaaaaaaaaaaaaaaab,0x809405e20,0x801f0ee80) = 0 (0x0)
munmap(0x814000000,4194304)                      = 0 (0x0)
madvise(0x8057fc000,0x1000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
munmap(0x809400000,4194304)                      = 0 (0x0)
madvise(0x8024f4000,0x3000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x8024fa000,0x8000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x802503000,0x2000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
madvise(0x802528000,0x2000,0x5,0xaaaaaaaaaaaaaaab,0x7fffffffb9e0,0x801f0ee80) = 0 (0x0)
write(4,"\^A\0\0\0\^A",5)                        = 5 (0x5)
shutdown(4,SHUT_RDWR)                            = 0 (0x0)
close(4)                                         = 0 (0x0)
Error: write(2,"Error: ",7)                              = 7 (0x7)
stouwrite(2,"stou",4)                            = 4 (0x4)

write(2,"\n",1)                                  = 1 (0x1)
[...]

Is it possible to add more debug flags/output to the program?

Once we found the corrupt zone(s) I will file in a bug at github.


-- 
Oliver PETER       oliver at gfuzz.de       0x456D688F


More information about the Pdns-users mailing list