[Pdns-users] Intermittent recursion failure due to timeout

Brian T b225ccc at gmail.com
Tue Jan 23 21:00:58 UTC 2018


I've been seeing intermittent lookup failures for '
nova.clouds.archive.ubuntu.com'.  Looking at the logs, I see these messages
during the time when the lookup fails:

Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Wants DNSSEC processing, auth data in query
for A
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Looking for CNAME cache hit of '
nova.clouds.archive.ubuntu.com|CNAME'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: No CNAME cache hit of '
nova.clouds.archive.ubuntu.com|CNAME' found
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: No cache hit for '
nova.clouds.archive.ubuntu.com|A', trying to find an appropriate NS record
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333] : got TA for '.'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333] : setting cut state for
. to Secure
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Checking if we have NS in cache for '
nova.clouds.archive.ubuntu.com'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: no valid/useful NS in cache for '
nova.clouds.archive.ubuntu.com'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Checking if we have NS in cache for '
clouds.archive.ubuntu.com'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: NS (with ip, or non-glue) in cache for '
clouds.archive.ubuntu.com' -> 'piru.canonical.com'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: within bailiwick: 0, not in cache / did not
look at cache
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: We have NS in cache for '
clouds.archive.ubuntu.com' (flawedNSSet=0)
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: initial validation status for
nova.clouds.archive.ubuntu.com is Indeterminate
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Cache consultations done, have 1 NS to
contact
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com.: Nameservers: piru.canonical.com.(38.62ms)
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Trying to resolve NS 'piru.canonical.com'
(1/1)
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]    piru.canonical.com:
Wants DNSSEC processing, NO auth data in query for A
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]    piru.canonical.com:
Looking for CNAME cache hit of 'piru.canonical.com|CNAME'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]    piru.canonical.com:
No CNAME cache hit of 'piru.canonical.com|CNAME' found
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]    piru.canonical.com:
Found cache hit for A: 91.189.95.68[ttl=1340]
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]    piru.canonical.com:
updating validation state with cache content for piru.canonical.com to
Indeterminate
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Resolved 'clouds.archive.ubuntu.com' NS
piru.canonical.com to: 91.189.95.68
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Trying IP 91.189.95.68:53, asking '
nova.clouds.archive.ubuntu.com|A'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: timeout resolving after 5535.35msec
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: Failed to resolve via any of the 1 offered
NS at level 'clouds.archive.ubuntu.com'
Jan 23 05:47:55 n422 pdns_recursor[8739]: [414333]
nova.clouds.archive.ubuntu.com: failed (res=-1)

My recursor config has 'network-timeout' set to 5500ms (instead of the
default 1500ms), so I can understand the "timeout resolving after
5535.35msec" message, but I was surprised by the timestamps of the messages
all being the same.  Are these messages flushed to syslog all at once or is
the timeout really happening immediately?

Aside from the timing question, can anyone offer any insight about this
sequence of events that lead up to the timeout?  Would this "just" indicate
a network or transport layer issue or is there anything in the events
leading up to the timeout that look suspicious?  This has been happening
about 10-15 times per day and started happening on Jan 18th.  I see similar
timeouts to 3 other sites, but this one most frequently.

Lastly, is there any way to retry here?  I'd rather lower the network
timeout and attempt a retry if possible.

This is version 4.1.0

Thanks

-- 
Brian Talley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20180123/2219dac1/attachment.html>


More information about the Pdns-users mailing list