This issue turned out to be way more subtle than I anticipated. It is
also not limited to FreeBSD.
When an LMDB environment is prepared using mdb_env_open(), the lockfile
for the relevant database is mmap()ped into the calling process' address
space. The lockfile contains a table ("reader locktable") with multiple
slots which are filled up as more concurrent readers access data within
the database. The environment prepared by mdb_env_open() may be used by
multiple threads. When a thread initiates a read-only transaction using
mdb_txn_begin(), the LMDB library acquires a reader locktable slot and
stores a pointer to that slot in thread-local storage. The next time
this thread attempts to initiate another transaction, this reader
locktable slot is retrieved from thread-local storage and sanity-checked
to ensure this thread has no transaction currently pending and that no
other process stamped on the slot. When a thread exits, its reader
locktable slot is released by setting the slot's owner PID to 0. When
an environment is closed using mdb_env_close(), all reader locktable
slots created by all threads within a given process are released.
BIND trips over that last statement upon "rndc reload": "new" views are
configured and the environments for their respective NZDs are created
while the "old" views and the environments created for them still exist.
Upon successful reconfiguration, the "old" views are destroyed along
with their respective NZD environments. This involves mdb_env_close()
getting called for each "old" view. Remember, though, that the LMDB
reader locktable lives in an mmap()ped file, so when accessing it, both
the "old" and the "new" LMDB environments are reading from/writing to
the same place (though using different virtual address ranges). When
mdb_env_close() is called for an "old" environment, it sets the owner
PID to 0 for all reader locktable slots created by all threads within a
given process, i.e. it mangles the reader locktable slots seen by its
respective "new" environment.
The next time any LMDB transaction is initiated using one of the "new"
environments, one of two things can happen:
- If the worker thread initiating the transaction has not initiated
any transactions for the "new" environment so far, there will be no
reader locktable slot in its thread-local storage. One will
subsequently be created and the transaction will commence.
- If the worker thread initiating the transaction has initiated
transactions for the "new" environment in the past, the reader
locktable slot will be fetched from thread-local storage, but it
will fail the sanity check as the owner PID for the slot will be 0,
i.e. different than the PID of the process which created the LMDB
environment, resulting in "MDB_BAD_RSLOT: Invalid reuse of reader
locktable slot".
As named creates a fixed number of worker threads and any of them may be
the one initiating the transaction, the chances of triggering an error
in the above scenario are inversely proportional to the number of worker
threads; when BIND is built with --disable-threads, the scenario above
will always result in MDB_BAD_RSLOT, as proven by the addzone system
test.
Changing the order in which NZD environments are created and destroyed
upon reload would require severe changes in BIND. Fortunately, there is
a workaround: creating NZD environments with the MDB_NOTLS flag, which
causes a different model for assigning reader locktable slots to be
used: instead of assigning them to threads using thread-local storage,
they are assigned to transaction objects. This causes a new reader
locktable slot to be created for each transaction, thus preventing
multiple environments created by a process for the same database from
stamping on each other's data.
Please review rt46556 which sets MDB_NOTLS for all relevant
mdb_env_open() calls.