[dnsdist] Cache, chrome and dns tunneling

Mon May 7 14:29:28 UTC 2018

 (resending because image to large for list)

I'll try to answer the questions and add more info.
the ip over DNS queries are like this (answer)
    172.25.241.34.domain > dnsdist1.33499: [udp sum ok] 2837 q: NULL? cF.
UwDnhZkQc8EQ3Ncrj7UOpH15TWv4qFm9hBhejIZ0rMVUSHszvOW2ukG5JXW.
UFbuPkfS5Rko5zhpItBXQBXKXTcLG18Z8Nyqc1RsFLo6W6nhp5TBgpPvS8KHtWx.
Rz9NNyX9BJlGGMBWwKfebMmDtMIR6PriLLOEkH3fCxfG12G3h5jkcTRJ2Jl.s23.2yf.de
<http://cf.uwdnhzkqc8eq3ncrj7uoph15twv4qfm9hbhejiz0rmvushszvow2ukg5jxw.ufbupkfs5rko5zhpitbxqbxkxtclg18z8nyqc1rsflo6w6nhp5tbgppvs8khtwx.rz9nnyx9bjlggmbwwkfebmmdtmir6prilloekh3fcxfg12g3h5jkctrj2jl.s23.2yf.de/>.
1/1/1 cF.UwDnhZkQc8EQ3Ncrj7UOpH15TWv4qFm9hBhejIZ0rMVUSHszvOW2ukG5JXW.
UFbuPkfS5Rko5zhpItBXQBXKXTcLG18Z8Nyqc1RsFLo6W6nhp5TBgpPvS8KHtWx.
Rz9NNyX9BJlGGMBWwKfebMmDtMIR6PriLLOEkH3fCxfG12G3h5jkcTRJ2Jl.s23.2yf.de
<http://cf.uwdnhzkqc8eq3ncrj7uoph15twv4qfm9hbhejiz0rmvushszvow2ukg5jxw.ufbupkfs5rko5zhpitbxqbxkxtclg18z8nyqc1rsflo6w6nhp5tbgppvs8khtwx.rz9nnyx9bjlggmbwwkfebmmdtmir6prilloekh3fcxfg12g3h5jkctrj2jl.s23.2yf.de/>.
[1m] NULL ns: s23.2yf.de. [22h51m7s] NS ems23.2yf.de. ar: . OPT
UDPsize=4096 (287)
As mentioned, the answer TTL is short (1m)

The chrome queries are for random, one label domains and are a "feature" as
chromium code
explains:
>Because this function can be called during startup, when kicking off a URL
fetch can eat up 20 ms of time, we delay seven seconds, which is hopefully
long enough to be after startup, but still get results back quickly.
>This component sends requests to three randomly generated, and thus likely
nonexistent, hostnames. If at least two redirect to the same hostname, this
suggests the ISP is hijacking NXDOMAIN, and the omnibox should treat
similar redirected navigations as 'failed' when deciding whether to prompt
the user with a 'did you mean to navigate' infobar for certain search
inputs.
>trigger: "On startup and when IP address of the computer changes."
>We generate a random hostname with between 7 and 15 characters.
obviously this queries are all NXDOMAIN, so there are suposed to be
negatively cached

Today, to collect some data we disabled the 2 mentiones rules, both and
each one by itself.
The dns tunnel did not have any visible impact.
But when we disabled the skip cache for 1 label queries we noticed again
the same behaviour.

remember that our cache is configures as:
cache = newPacketCache(1000000, 86400, 0, 60, 60)
 getPool("dnsdist1"):setCache(cache)
 setCacheCleaningDelay(30)
 setCacheCleaningPercentage(20)

and it usually stays at 80% whith a 98% hit rate
after disabling the skip the cache periodically (every 50 minutes, more or
less)
filled up to 100%, the hit rate dropped to 92% and our backed queries rate
jumped
from 1.6 kqps to almost 6kqps
This stays for almost half and hour and then recovers...

regarding:
> Your cache is limited at 1 million, you could try a bit more. You could
also
> explore the settings of newPacketCache in terms of TTL limits.

the cache size seems usually ok, because our hit rate stays at 98/99%
and we don't want to mess a lot with a production environment.
Also I don't know how the TTL limits affects negative caching, which seems
to be the driver of this situation.
Some graphs links:
https://pasteboard.co/Hk5kkm6.png  -> general and net view
https://pasteboard.co/Hk5hgU5.png  ->  cache behaviour
https://pasteboard.co/Hk5hRcd.png -> cache hit rate

The graph show cache strange behavior without chrome rule.  until 14:00
when the rule is placed again
and cache normalize again.

As Daniel Stirnimann mentioned, I also think the issue is about negative
caching TTL.

We will move to 1.3 in a couple of weeks and will update about this
when info will be available.

Any questions are welcome!

On Sun, May 6, 2018 at 3:36 PM, Daniel Stirnimann <
daniel.stirnimann at switch.ch> wrote:

> On 05.05.18 12:40, Ask Bjørn Hansen wrote:
> >
> >> On May 3, 2018, at 17:25, Nico <nicomail at gmail.com> wrote:
> >>
> >> After some tcpdumping and testing we found that chrome and dns
> tunneling were filing the cache,
> >> even if the percent of this queries was very low in the total.
> >
> > What do those queries look like?
>
> For the chrome part, I guess he is talking about queries like these from
> Android mobile devices using Google Chrome:
>
> xmbltwvfgzoj AAAA
> oputhfmeqha AAAA
> fpxfkjurisphngo AAAA
> oputhfmeqha A
> fpxfkjurisphngo A
> xmbltwvfgzoj A
>
> I noticed this too a few weeks ago when playing with an Android
> Emulator. I did not look into this more and cannot tell at what interval
> they appear exactly. They seem to appear at least every time I started
> Google Chrome. The queries are random. Next time they are completely
> different but of the same length and same query character set.
>
> The response is of course NXDOMAIN. Negative caching TTL for the root
> zone is 1 day.
>
> I guess most DNS resolver software limit the negative caching TTL to
> something a fair bit lower. I just looked it up for PowerDNS recursor
> and it is set to max 3600 sec:
> https://doc.powerdns.com/md/recursor/settings/#max-negative-ttl
>
> Maybe the problem is that dnsdist has no max negative ttl limit?
> https://dnsdist.org/guides/cache.html
>
> Daniel
> _______________________________________________
> dnsdist mailing list
> dnsdist at mailman.powerdns.com
> https://mailman.powerdns.com/mailman/listinfo/dnsdist
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.powerdns.com/pipermail/dnsdist/attachments/20180507/aea210f0/attachment.html>