Please review rt45637. This branch has the following bits from #44029:
* Set fillcount and freemax on msg->namepool and msg->rdspool
commit 88fb5e9fc53e16a100bc1181ad5e7a64d69dab9e
A bug was fixed in setting freemax on m->rdspool. fillcount() is also
set to avoid lock contention in isc___mempool_get(), and the default
sizes have been bumped a bit to help the COM delegation case that has
29 RRs.
* avoid locking mctx when tracklines isn't in use
commit 5bb7e77365117e54ecd55a29d6856aa0de04f3a3
Avoids unnecessary MCTXLOCK
* Fixup rwlock tracing when using atomics
commit b315f40288649016017c2c176d084823cf0c6583
isc_rwlock tracing code has been fixed for the common case where
atomic instructions are used in the rwlock implementation.
* Reduce operations in inline buffer functions
commit 8591709fa02411f5682d7f4401fab020d0b79152
This removes various redundant boolean AND operations that are not
necessary as the value is truncated when written into the shorter
variable. It also fixes a compiler warning in dns_ssu_external_match()
when the inline buffer macros are used.
* Use inline buffer implementation
commit df48125be8b901b3532abf6261b09fa509f1a057
This turns on use of inline buffer functions. These inline buffer
functions do less checking, and because they're inlined they're a lot
faster. I have an argument for adding some checks because we've had
some CVEs found due to assertion failure in buffer.c recently. But
these functions (especially the putuint16) are called a *LOT* and need
to be fast to get named to perform well. Every additional check here
lowers performance.
* Reduce work in hash function
commit 291305b70dece58735e11d07e68c31d1dfa6959c
The hash function is called a very very large number of times. The
pthread_once calls have a performance penalty, so they're skipped
unless necessary. The commit also removes various redundant boolean
AND operations that are not necessary as the value is truncated when
written into the shorter variable.
* Improve the performance of rdataset_getownercase() in RBTDB
commit 5c8623f1257d6597e9fd45f47d90fbcd34650814
Owner case recording was added in 9.11. As delegations have large
numbers of RRs in the reply, getownercase() is called a huge number of
times. The implementation in 9.11 was very slow (and caused a
performance dip from 9.10 to 9.11). There's no way to avoid this, so
its performance was improved and a shortcut fastpath was also added
for the common case where the owner case is entirely lowercase.
* Improve the performance of dns_name_getlabelsequence()
commit 17bf42955e8575c07e399ea2bba4dd0fbd4ca83f
dns_name_getlabelsequence() is called a very large number of times,
but mainly via the name compression codepath. The name compression
codepath calls dns_name_getlabelsequence() in a specific way (it
always attempts a label sequence of the rest of a name starting from
some offset), so performance of dns_name_getlabelsequence() was
tweaked to be fast for this case.
* Improve performance of compression code
commit 3bc6a92c08abead11d7304e90631b9d7f88d48f5
dns_compress_findglobal() is *very* high on the list of functions that
are called often. It needs to be fast. This commit makes several
changes to the compression code to make it run faster. Mainly:
+ it doesn't perform a wasteful hash the whole name being checked any
more, but looks up the hash of only the first octet of the first
label of the name from a lookup table.
+ it inlines calls to dns_name_equal() and dns_name_caseequal() to
avoid re-running some code it needs to run for calculating the
hash. Inlining this code is also necessary for improving the speed
of dns_compress_findglobal().
* Clean up tracklines code
commit b7b18a1124e7b27b59d572850614e4b616c13899
Just a cleanup
* Iterate one more time in hand-unrolled loops
commit 4ff39ae7d8e059d982cf48be6b5427a046dcc7bf
Tiny fix to make it iterate one more time.
(These hand-unrolled loops need to go, but we need to start using GCC
attributes (per-function optimization flags). Adding -funroll-loops
globally reduces performance.)
Mukund