[Pdns-dev] Some notes for Solaris 10 on x86 users of the PowerDNS Recursor
bert hubert
bert.hubert at netherlabs.nl
Mon Aug 30 09:46:22 CEST 2010
Hi,
This message is for everyone using the PowerDNS Recursor with Solaris 10 on
x86 (non-UltraSPARC) hardware.
It turns out that Solaris 10 on x86 has some issues standing in the way of
high performance for the PowerDNS Recursor. With some care, good results can
be achieved however.
If you need help, please do not hesitate to contact us.
All details can be found on:
http://bert-hubert.blogspot.com/2010/08/some-notes-on-solaris-10-x86-64-bit.html
I've also pasted parts of this post below:
"Some notes on Solaris 10 x86, 64 bit compilation, bugs and memory
allocators"
(...)
The first thing we noticed was that , the 'Ports' event multiplexer failed
to work on x86 applications, as described in long standing Solaris bug 'CR
6268715 "library/libc port_getn(3C) and port_sendn(3C) not working on
Solaris x86"'. Apache, libevent and PowerDNS all contain workarounds for
this bug, but that workaround does come with performance implications. At
the very least it is worrying.
Secondly, it turns out that Solaris 10 on x86 can't link 64 bits binaries as
generated by system gcc compiler, at least, not those binaries using Thread
Local Storage for objects at global scope. This is Solaris bug 'CR 6354160',
aka 'Solaris linker includes more than one copy of code in binary when
linking gnu object code', which we worked around by changing PowerDNS so it
could be compiled as one big C++ file.
Using the native Sun Studio compiler failed, because it is not compliant
enough with the C++ standard to compile PowerDNS, and the changes required
were non-trivial.
Although both issues (ports_getn() and 64 bits linking) were known, and
fixes were available in OpenSolaris, these had not made it into Solaris 10
production releases.
Eventually, PowerDNS was able to work around both bugs, but in the case of
6268715 at a runtime performance cost (note: Sun has now shipped
'IDR145429-01' which fixes this).
Which brings us to performance. For some reason, even though the PowerDNS
Recursor uses 'share nothing' threads, there was no scalability when using
multiple threads on Solaris. In fact performance was rather dismal anyhow,
even with only one thread.
Firstly, we discovered that having multiple threads try to wait on a single
socket does not scale beyond a single thread. This was fixed by having only
a single thread wait on the socket, and manually distributing queries over
threads in a round-robin fashion.
This turned out to help slightly, but not decisively. We then discovered
that the default Solaris x86 memory allocator ('malloc()') is effectively
single-threaded (unlike the UltraSPARC variant, which is completely
different!). Solaris ships with no less than two alternative mallocs, called
-lmtmalloc and -lumem respectively. Using libumem helped for benchmarking.
Finally, for Solaris, we had to bring back an old favorite, the 'fork-trick'
which makes the whole PowerDNS Recursor fork itself into multiple processes,
which helped bring Solaris performance up to par with our other major
platform, Linux. We don't yet know why our 'share nothing' threads end up
interfering with each other.
The resulting work was taken into production.. and crashed within 5 minutes
of heavy load, indicating an out of memory error. With a 64 bit binary on an
8 gigabyte machine, this seemed doubtful.
After some further investigations, it was found that while libumem certainly
was faster for multithreaded code, but that it also wastes memory on a
prodigious scale. To be honest, this may be due to the fact that the g++ c++
runtime libraries are not making optimal use of the allocator, or our use of
get/set/swap/makecontext(), but the amount of memory used was staggering.
Think 450MB for storing 10MB of content.
We studied some of the articles available online, among which was 'A
Comparison of Memory Allocators' on the 'Oracle Sun Development Network'.
This one indeed showed graphs of libumem using large amounts of memory, and
a thing called ptmalloc using very little. Oddly enough, ptmalloc is (more
or less) the default allocator for Linux too.
We then built a PowerDNS with all the workarounds, plus ptmalloc linked in,
and now finally have something that survives production use!
Rounding this off:
Solaris x86 is remarkably different from Solaris UltraSPARC (different bugs,
different allocators)
Do not have n>1 threads wait on a single datagram socket filedescriptor, it
does not scale
There now IS an IDR to get ports_getn() working, IDR145429-01, which should
also speed up Apache and several other high-performance applications for
Solaris
To build 64 bits binaries with thread local storage (__thread) at global
scope, concatenate all your C++ into one big file, and compile that one
Be aware that the default allocator on Solaris 10 x86 is single-threaded
Be aware that both mtmalloc and libumem may use prohibitive amounts of
memory for some programs
Consider ptmalloc3
We still have to investigate why fork() scales better than pthread_create()
Make sure that you have some friends within Sun engineering ;-)
All in all, we still consider Solaris 10 x86 a 'supported platform' for the
PowerDNS Recursor, but along the way we had some doubts.. Solaris 10 on
UltraSPARC continues to work very well meanwhile!
Bert
More information about the Pdns-dev
mailing list