[Pdns-users] pdns-recursur 4.4: host unknown after some time with no clear reason

Otto Moerbeek otto at drijf.net
Wed Jun 1 10:30:45 UTC 2022


Hello,

The 4.4 branch went EOL this week. In general it is not wise to create
an environment that is "impossble" to update.

Without actual config data, setup details, logs or dumps from the
internal tables this is impossble to diagnose.

Please check https://blog.powerdns.com/2016/01/18/open-source-support-out-in-the-open/

	-Otto


On Wed, Jun 01, 2022 at 11:10:57AM +0200, Jan Huijsmans via Pdns-users wrote:

> Hello,
> 
> We have a strange problem in one of our airgapped environments while we
> use the same setup in others where we don't have the issue. After some
> time (varies form seconds to hours), the recursor refuses to give any
> answer other then host unknown (SRV_FAIL when I remember correctly).
> 
> Situation:
> 
> Airgapped environment with 2 DNS servers, each with:
> * recursor listening to internal interface
> * authoritive listening to external interface
> * DNS lookups trough recursor via external simulated root server to
>   designated authoritives
> 
> The problem exists within 1 environment where the links to external
> authoritive servers for root and other domains are slow (1 Mbit or less)
> and some zones (including root) have very interesting NS records. (NS
> with hostnames with missing A records) For the root zone this is fixed,
> but some others still are messy. After a while, the recursor refuses to
> give ansers to any query, no matter if the DNS server that should
> answer is configured correctly or not. The only thing that helps in that
> situation is a restart of the recursor.
> 
> With log level at max (9) all we see at the moment of the issue is that
> the recursor answers from packet cache, with no attempts to query
> externally. The last query in the log is also not remarkable just
> either works (valid query) or doesn't (invalid query to domains unknown
> in the environment), no indication of throtteling, timeouts, missed
> packets or long responce times.
> 
> When the problem shows up, dig @<recursor ip> fails. However, the moment
> we use the +trace option, the dig command works around the recursor
> after the 1st lookup (NS of .) and gets the answer correctly.
> 
> We can't seem to reproduce the error in the other environments, can't
> get logging that points to the issue (log level 9 is max?) or even
> think of a logical reason why this would happen (apart from
> throtteling). We've set option dont-throttle-netmasks to 0.0.0.0/0 which
> seems to help a lot, but not solve the problem completely.
> 
> I'd try to set non-resolving-ns-max-fails to 0 when we were on 4.5. but
> alas we're stuck at 4.4 at the moment (no way to upgrade the airgapped
> environment).
> 
> We need either a way to keep the recursors querying the NS servers
> to get an answer, or be able to prove which server/environment is the
> cause of the issue.
> 
> -- 
> 
> Jan Huijsmans              bofh at koffie.nu
> 
> ... cannot activate /dev/brain, no response from main coffee server
> _______________________________________________
> Pdns-users mailing list
> Pdns-users at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/pdns-users


More information about the Pdns-users mailing list