<div dir="ltr">Hi Remi,<div>Yes, the new version was almost 30% better in the full config test. Great!</div><div><br></div><div>> So quite a noticeable gain but it looks like lock contention is still an</div><div>> issue. I would like to understand why, if you don't mind answering a few</div><div>> questions.</div><div>> </div><div>> - You mentioned having 32 cores, are they real cores or is it with</div><div>> hyper-threading? Intel reports [1] only 8 real cores for the E5-2660, so</div><div>> you should probably stick with at most 8 total threads per CPU</div><div>> (listeners mostly in your case).</div><div>You are right, this is with HT.</div><div>CPU(s):                32</div><div>Thread(s) per core:    2</div><div>Core(s) per socket:    8</div><div>Socket(s):             2</div><div>NUMA node(s):          2</div><div>Model name:            Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz</div><div>CPU MHz:               2194.783</div><div>L1d cache:             32K</div><div>L1i cache:             32K</div><div>L2 cache:              256K</div><div>L3 cache:              20480K</div><div><br></div><div>Regarding the number of listeners, we run the tests with different amounts of listeners:</div><div>1,2,4,8,12,</div><div>1 listener was the worse, 120 Kqps,</div><div>the other configs were more or less the same oscillating from 165 to 175 being the </div><div>2 and 8 listeners configs the more stable..</div><div><br></div><div><br></div><div>> - I'd be interested in the results of the dumpStats() and</div><div>> cache:printStats() commands during your test, as well as a perf top,</div><div>> ideally with a vanilla dnsdist and a dnsdist-concur.</div><div>See attached file </div><div><br></div><div>> - The cache cleaning algo might be a bit aggressive by default, you can</div><div>> tweak it with:</div><div>> setCacheCleaningDelay(30)</div><div>> setCacheCleaningPercentage(20)</div><div>Done, but no impact. (because of our test set)</div><div> </div><div>> - Exporting carbon data to our public metronome instance would be great</div><div>> too, as it would immediately make a lot of metrics available to us. You</div><div>> can do that with: carbonServer('37.252.122.50  ', '<yourname>', 30)</div><div>unfortunately can't do that. This is on a closed net.</div><div>We have our own carbon-graphite and check the stats there.</div><div>can send you any additional info you like to get.</div><div><br></div><div>> - Tuning the network buffer might also help:</div><div>> net.core.rmem_max=33554432</div><div>> net.core.wmem_max=33554432</div><div>> net.core.rmem_default=16777216</div><div>> net.core.wmem_default=16777216</div><div>Already done with very similar values.</div><div>Also tried kernel.sched_migration_cost_ns, but with no visible impact.</div><div><br></div><div><br></div><div>> - Would you consider upgrading your kernel? There has been a lot of</div><div>> improvements since 3.10.0, and we noticed huge performance increases in</div><div>> the past just by upgrading to a 4.x one.</div><div>I would like to do that, but we are required to use redhat....</div><div>We've done some tests on a small core2 with 4 cores whith 4.9 and </div><div>we obtained almost the same results as in the "big one".</div><div>This was a surprise.</div><div>Trying to find a way (if security approves) to update redhat kernel.</div><div> </div><div> </div><div>> Oh and if you didn't already, would you mind setting</div><div>> setMaxUDPOutstanding() to 65535? Even at a 99% cache hit ratio, that</div><div>> leaves quite a few requests going to the backend so we better be sure we</div><div>> don't mess up these. The cache in dnsdist tries very hard not to degrade</div><div>> performance, so we prefer skipping the cache and passing the query to a</div><div>> backend rather than waiting for a cache lock, for example.</div><div>Already done, also no difference.</div><div>The queries we are sending are ~50 continously repeating.</div><div><br></div><div>Will keep testing. But I think this is all we can get by now.</div><div>The optimum config now seems to be 3 processes with 6 or 8 listeners each.</div><div>Will have to do some workarounds on the stats (aggregation rules on graphite?) and </div><div>service control scripts.</div><div><br></div><div>Thanks again!</div><div><br></div></div>