[Pdns-users] Notification for domains to ip1:53 failed after retries

Steve Zeng steve.zeng at booking.com
Wed Jan 17 14:13:33 UTC 2018


Pieter,

I checked BIND slaves logs around the time frame and found:

10-Jan-2018 18:11:17.211 notify: client 10.198.180.41#12149: received notify for zone 'example.com'
10-Jan-2018 18:11:17.211 general: zone lhr4.dqs.booking.com/IN: notify from 10.198.180.41#12149: no serial
10-Jan-2018 18:11:24.387 notify: client 10.198.180.41#12149: received notify for zone 'example.com'
10-Jan-2018 18:11:24.387 general: zone lhr4.dqs.booking.com/IN: notify from 10.198.180.41#12149: no serial
10-Jan-2018 18:11:29.453 notify: client 10.198.180.41#12149: received notify for zone 'example.com'
10-Jan-2018 18:11:29.453 general: zone lhr4.dqs.booking.com/IN: notify from 10.198.180.41#12149: no serial
10-Jan-2018 18:11:38.350 notify: client 10.198.180.41#12149: received notify for zone 'example.com'
10-Jan-2018 18:11:38.350 general: zone lhr4.dqs.booking.com/IN: notify from 10.198.180.41#12149: no serial

wondering why there is ’no serial’ in the logs. Since the column does have the value:

> select * from domains where name='example.com'\G;
*************************** 1. row ***************************
             id: 484
           name: example.com
         master: 10.187.125.2:53,10.187.125.2:53
     last_check: 1516197871
           type: SLAVE
notified_serial: 2016918645

is “no serial” the cause of notification failure?

Thanks,
Steve

> On Jan 17, 2018, at 11:31 AM, Steve Zeng <steve.zeng at booking.com> wrote:
> 
> Pieter,
> 
> Thanks a lot for the great explanation and all possibilities. 
> 
>> Do the BIND logs indicate a NOTIFY was received (you might need to bump verbosity)?
> I will double check again if the BIND slaves acknowledged or received the NOTIFY messages. 
> 
> I came across this post and was concerned if the high number of NOTIFY/AXFR overloaded PowerDNS, given that we have ~6,000 zones and ~100 BIND slaves. Do you know if there is a built-in limit on the AXFR volume? 
> 
> https://mailman.powerdns.com/pipermail/pdns-users/2007-May/016527.html
> 
>> If replication-lag is an issue for you and you want to use PowerDNS as
>> the non-hidden nameservers, it would make sense to use NATIVE zones[1].
> make sense and I agree. it is definitely the path we are going forward. It is an intermediate state with AXFR instead of native replication. 
> 
> Thanks,
> Steve
> 
>> On Jan 17, 2018, at 10:23 AM, Pieter Lexis <pieter.lexis at powerdns.com> wrote:
>> 
>> Hi Steve,
>> 
>> On Mon, 15 Jan 2018 14:41:51 +0100
>> Steve Zeng <steve.zeng at booking.com> wrote:
>> 
>>> we are migrating our DNS master from BIND to PowerDNS. The approach we take is to put PowerDNS in the middle of an current replication chain as below:
>>> 
>>> BIND DNS master -> PowerDNS -> BIND DNS slaves
>>> 
>>> It works most of the time. However, from time to time we experienced long delay when making a DNS change. further investigation shows that the delay seems on PowerDNS. we see lots of errors 
>>> 
>>> 2018-01-10T18:13:24.728722+01:00 pdns_server1 pdns_server[2250]: Jan 10 18:13:24 Notification for example.com to ip1:53 failed after retries
>>> 2018-01-10T18:13:24.728848+01:00 pdns_server1 pdns_server[2250]: Jan 10 18:13:24 Notification for example.com to ip2:53 failed after retries
>>> 2018-01-10T18:13:24.728975+01:00 pdns_server1 pdns_server[2250]: Jan 10 18:13:24 Notification for example.com to ip3:53 failed after retries
>>> 
>>> ip1,ip2,ip3 are BIND slaves.
>>> 
>>> no other errors found with regard to the root cause. it happens occasionally. Questions are:
>> 
>> It looks like that, for whatever reason, the BIND-slaves do not
>> acknoledge the NOTIFY message multiple times. Or perhaps they are not
>> received at all. Do the BIND logs indicate a NOTIFY was received (you
>> might need to bump verbosity)?
>> 
>> If they are not received, _something_ on the networkpath between the
>> servers loses these messages. If the are received (and acted upon by
>> BIND), check if the acknoledgements reach the PowerDNS server.
>> 
>>> 1. Is there any rate limit as far as PowerDNS is concerned? before PowerDNS is put in the middle, there is no such delay
>> 
>> There is no rate-limiting in PowerDNS.
>> 
>>> 2. Is it configurable to set how many retries?
>> 
>> This is not configurable.
>> 
>>> Should PowerDNS should ensure the notifications going through rather than drop after a certain times of retry?
>> 
>> A lost NOTIFY can mean anything, e.g. server is no longer a nameserver,
>> network is broken, server is overloaded. Re-trying (and keeping this
>> data indefinetely) would take up too much resources. Slaves will also
>> check the SOA serial the master at some point and notice they are out of
>> date and initiate an AXFR.
>> 
>> If replication-lag is an issue for you and you want to use PowerDNS as
>> the non-hidden nameservers, it would make sense to use NATIVE zones[1].
>> These rely on database-replication instead of DNS-based replication of
>> the data.
>> 
>> Best regards,
>> 
>> Pieter
>> 
>> 1 - https://doc.powerdns.com/authoritative/modes-of-operation.html#native-replication
>> 
>> -- 
>> Pieter Lexis
>> PowerDNS.COM BV -- https://www.powerdns.com
>> _______________________________________________
>> Pdns-users mailing list
>> Pdns-users at mailman.powerdns.com
>> https://mailman.powerdns.com/mailman/listinfo/pdns-users
> 




More information about the Pdns-users mailing list