[Pdns-users] Immediate update visibility

Brian Candler b.candler at pobox.com
Wed Mar 9 07:37:45 UTC 2022


On 09/03/2022 07:08, Daniel Miller via Pdns-users wrote:
> Anyway, after all that - when I make a change to a domain record using 
> pdnsutil or an external tool using the API - the changes are 
> immediately applied to the zone but are not immediately visible 
> through the recursor. To make that happen I need to either flush the 
> cache or just restart the recursor.
>
> This is an issue when creating/updating ACME challenge records - I 
> haven't been able to totally automate the process. I need to introduce 
> lengthy delays, try manually applying the changes, restart the 
> servers, whatever.

That doesn't really make sense as an explanation of whatever problem you 
see.

1. LetsEncrypt will be talking to your authoritative server, not your 
recursor.

2. Even if it were talking to the recursor, it would be querying 
_acme-challenge.somedomain TXT. Unless that query had been made 
recently, it won't be in the recursor's cache.

If you're hitting a caching problem here, it's not to do with the 
recursor, but either the packet cache or the query cache in 
pdns-authoritative. See: 
https://doc.powerdns.com/authoritative/performance.html#packet-cache

If LetsEncrypt had queried _acme-challenge.somedomain TXT a few seconds 
before you had changed the zone, and then again afterwards, it could see 
the old data. However, that shouldn't be happening: you should be 
inserting the TXT record *before* LetsEncrypt does the query. Therefore, 
although you can disable those caches, you shouldn't really need to do so.

The most likely problem I can think of is that your authoritative zones 
are replicated, and there's some delay in updates to the primary getting 
replicated to the secondaries.  Remember that LetsEncrypt could query 
*any* of your auth nameservers with equal probability.

One solution is to ensure that notifies are working properly, and then 
insert a short (say 5 second) delay in your ACME process to ensure it 
has had time to complete.

Another solution is to get LetsEncrypt to talk to a single instance, by 
putting a single NS record wherever you need:

_acme-challenge.www.example.com.  NS  ns-primary.example.com.

If you wish, this approach also lets you have a completely separate 
authoritative server, dedicated to handling ACME challenges. That in 
turn can be something that accepts dynamic updates, without having to 
allow dynamic updates on your main infrastructure.

If you need to debug this further, I suggest you capture the data 
between LetsEncrypt and your authoritative servers, with query logging 
or at worst using tcpdump, to work out what's going on.


>
> is there a way to make changes in the auth server immediately visible 
> in the recursor?

You mean, clients using your local recursor are querying local zones and 
seeing stale data? That's a completely different matter: that's just 
standard recursor caching, and it's how the DNS is designed.

You can avoid that by setting a low TTL on the records in your zone, and 
for negative caching using the "minimum" parameter in the SOA record.  
In the extreme, you'd set those to zero, and then the recursor would 
directly forward all queries to the authoritative server - but something 
like 60 seconds is more system friendly.  You might as well get *some* 
benefit from the recursor cache.

Or else, whenever you bump the auth zone, you can flush the 
corresponding recursor zone - but that's a step you'd have to do yourself.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/pdns-users/attachments/20220309/b5845436/attachment.htm>


More information about the Pdns-users mailing list