[Pdns-users] Duplicate RRs in records table

ktm at rice.edu ktm at rice.edu
Thu Jul 3 12:56:30 UTC 2014


On Thu, Jul 03, 2014 at 02:01:49PM +0200, Klaus Darilion wrote:
> Another workaround (untested) would be to put an explicit lock at the
> beginning of the "delete-zone-query":
> delete-zone-query="LOCK;delete from records where domain_id=%d"
> 
> But (if it is allowed to have multiple statements in the
> delete-zone-query command) it would lock the whole table also for all
> zone updates which is probably bad for the performance.
> 
> regards
> Klaus
> 

Hi Klaus,

We have observed the same behavior here. When it takes longer to perform
a zone transfer than the periodic check interval (60s), a second will be
initiated with the results that you have reported. We currently time our
transfers to make certain that they are timely (<60s) and also watch the
table for duplicate zone information and clean up if it occurs. This really
should be in PDNS as a flag that a transfer is in progress so it does not
even try a second transfer. As you have noted, DB side solutions are less
effective and ruin the concurrency of the backend for updates. In particular,
we use temporary tables to stage the zone transfer and then only apply the
deltas to the production table. This eliminates the wholesale delete of
all of the zone records followed by its complete repopulation for even a
single record change. But temporary tables are only seen in the transaction
that created them, in our case, so a check in the server code would really
help. The comment in the code requires the backend to handle it:

- only one backend owns the SOA of a zone
- only one AXFR per zone at a time - double startTransaction should fail
- backends need to implement transaction semantics

with the results already seen if a second transfer is initiated. Yuck, it
really needs to be tracked by the server instead. +1 for bug but we have
been working around it for years. It is more of a problem with DNSSEC,
because of the additional processing needed which slow the transfers and
makes them more susceptible to this.

Regards,
Ken
> 
> On 03.07.2014 12:09, Klaus Darilion wrote:
> > Hi.
> > 
> > I think we found the cause for the problem (but no solution yet). It
> > seems the problem happens only during the first zone transfer, when
> > there are no RRs in the records table yet. See the following log messages:
> > 
> > 
> > 1. The zone is inserted into the domains table as type=SLAVE
> > 
> > 2. We execute "pdns_control retrieve example.com" to initiate immediatly
> > a zone transfer
> > 
> > 05:25:09 pdns[23463]: No serial for 'example.com' found - zone is missing?
> > 05:25:09 pdns[23463]: Initiating transfer of 'example.com' from remote
> > '1.2.3.4'
> > 
> > It seems this caused PowerDNS to put the zone transfer into its work-queue
> > 
> > 
> > Some seconds later, the periodic zone check finds out that the zone is
> > stale and also queues a zone transfer
> > 
> > 05:25:13 pdns[23463]: Domain 'example.com' is stale, master serial
> > 2014063000, our serial 0
> > 05:25:13 pdns[23463]: Initiating transfer of 'example.com' from remote
> > '1.2.3.4'
> > 05:25:13 pdns[23463]: No serial for 'example.com' found - zone is missing?
> > 05:25:13 pdns[23463]: AXFR started for 'example.com'
> > 05:25:13 pdns[23463]: Transaction started for 'example.com'
> > 05:25:14 pdns[23463]: No serial for 'example.com' found - zone is missing?
> > 05:25:14 pdns[23463]: AXFR started for 'example.com'
> > 05:25:14 pdns[23463]: Transaction started for 'example.com'
> > 05:25:14 pdns[23463]: AXFR done for 'example.com', zone committed with
> > serial number 2014063000
> > 05:25:14 pdns[23463]: AXFR done for 'example.com', zone committed with
> > serial number 2014063000
> > 
> > As you see, the zone is fetched 2 times concurrently. The second
> > transaction starts before the first transaction is finished.
> > 
> > Thus, there are 2 concurrent transactions:
> > 
> >                 T1                            T2
> >              BEGIN
> >              DELETE FROM records ....
> >              INSERT into records ....
> >                                        BEGIN
> >                                        DELETE FROM records ....
> >                                        INSERT into records ....
> >              COMMIT
> >                                        COMMIT
> > 
> > Now, the zone is inserted twice into the records table.
> > 
> > The problem happens only on the first transfer. For further transfers,
> > e.g. caused by NOTIFYs, there are already RRs in the records table and
> > the DELETE will delete rows. Therefore the DELETE will cause a lock on
> > the respective rows which will cause all concurrent transfers which will
> > also delete this rows to be locked out until the first transaction is
> > finished.
> > 
> > During the first zone transfer, the DELETE will not delete any rows.
> > Thus, there aren't any locks on the table and both transactions will
> > succeed.
> > 
> > I also tried setting the transaction isolation level to 'serializable'
> > but the problem persists.
> > 
> > I think there is no nice solution to this problem in the database. A
> > workaround would be to create a key on records(domain_id,type,content)
> > to avoid identical RRs via a table constraint (are identical RRs allowed?).
> > 
> > Otherwise, I think, there would be some other locking mechanism required
> > which has to be implemented in PowerDNS.
> > 
> > So, what do you think? Shall I file a bug report?
> > Thanks
> > Klaus
> > 
> > 
> > 
> > 
> > On 03.07.2014 11:04, Klaus Darilion wrote:
> >> Hi! We use PowerDNS 3.3.1 as slave with Postgresql DB as backend. Today
> >> I found out that for some zones the whole zone is duplicated in the
> >> records table (2 SOA records, ... every record is twice there). For one
> >> zone we had all the records 6 times - thus a zone with 6 SOA records, ....
> >>
> >> There is no manual intervention into the DB, only PowerDNS writes to the
> >> records table when it transfers the zone from the master.
> >>
> >> Does someone have an idea how this may be happen? E.g. can there be some
> >> DB problems (slow DB, timeout, connection drops ...) where PowerDNS
> >> inserts the records without prior deletion of the records?
> >>
> >> For some zones the last transfer was in 2011, for some 2013, thus maybe
> >> the problem was with some older PowerDNS version.
> >>
> >> Thanks
> >> Klaus
> >>
> >> _______________________________________________
> >> Pdns-users mailing list
> >> Pdns-users at mailman.powerdns.com
> >> http://mailman.powerdns.com/mailman/listinfo/pdns-users
> >>
> > 
> > _______________________________________________
> > Pdns-users mailing list
> > Pdns-users at mailman.powerdns.com
> > http://mailman.powerdns.com/mailman/listinfo/pdns-users
> > 
> 
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> http://mailman.powerdns.com/mailman/listinfo/pdns-users
> 




More information about the Pdns-users mailing list