Bug #46556 for bind9-public: 'showzone added.example in external failing' on freebsd

# Fri, 10 Nov 2017 02:01:42 -0500 Mark Andrews <marka@isc.org> - Ticket created [Reply] [Comment]

Subject:	'showzone added.example in external failing' on freebsd
Date:	Thu, 09 Nov 2017 21:01:42 -1000
From:	marka@isc.org
To:	bind9-public@isc.org

Download (untitled) / with headers
text/plain 319B

System test addzone "I:checking rndc showzone with newly added zone" elicits the following: 09-Nov-2017 18:40:58.888 received control channel command 'showzone added.example in external' 09-Nov-2017 18:40:58.888 mdb_txn_begin: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot bind9-master-freebsd11-32 on jenkins

# Fri, 10 Nov 2017 02:05:44 -0500 Mark Andrews <marka@isc.org> - Status changed from 'new' to 'open'

# Fri, 10 Nov 2017 02:05:44 -0500 Mark Andrews <marka@isc.org> - Severity S2 Normal added

# Fri, 10 Nov 2017 02:05:45 -0500 Mark Andrews <marka@isc.org> - Area feature changed to bug

# Fri, 10 Nov 2017 02:05:45 -0500 Mark Andrews <marka@isc.org> - Component BIND Server added

# Fri, 10 Nov 2017 02:05:45 -0500 Mark Andrews <marka@isc.org> - Priority P2 Normal added

# Fri, 24 Nov 2017 08:48:55 -0500 Michał Kępień <michal@isc.org> - Correspondence added [Reply] [Comment]

Download (untitled) / with headers
text/plain 3.8KiB

This issue turned out to be way more subtle than I anticipated. It is also not limited to FreeBSD. When an LMDB environment is prepared using mdb_env_open(), the lockfile for the relevant database is mmap()ped into the calling process' address space. The lockfile contains a table ("reader locktable") with multiple slots which are filled up as more concurrent readers access data within the database. The environment prepared by mdb_env_open() may be used by multiple threads. When a thread initiates a read-only transaction using mdb_txn_begin(), the LMDB library acquires a reader locktable slot and stores a pointer to that slot in thread-local storage. The next time this thread attempts to initiate another transaction, this reader locktable slot is retrieved from thread-local storage and sanity-checked to ensure this thread has no transaction currently pending and that no other process stamped on the slot. When a thread exits, its reader locktable slot is released by setting the slot's owner PID to 0. When an environment is closed using mdb_env_close(), all reader locktable slots created by all threads within a given process are released. BIND trips over that last statement upon "rndc reload": "new" views are configured and the environments for their respective NZDs are created while the "old" views and the environments created for them still exist. Upon successful reconfiguration, the "old" views are destroyed along with their respective NZD environments. This involves mdb_env_close() getting called for each "old" view. Remember, though, that the LMDB reader locktable lives in an mmap()ped file, so when accessing it, both the "old" and the "new" LMDB environments are reading from/writing to the same place (though using different virtual address ranges). When mdb_env_close() is called for an "old" environment, it sets the owner PID to 0 for all reader locktable slots created by all threads within a given process, i.e. it mangles the reader locktable slots seen by its respective "new" environment. The next time any LMDB transaction is initiated using one of the "new" environments, one of two things can happen: - If the worker thread initiating the transaction has not initiated any transactions for the "new" environment so far, there will be no reader locktable slot in its thread-local storage. One will subsequently be created and the transaction will commence. - If the worker thread initiating the transaction has initiated transactions for the "new" environment in the past, the reader locktable slot will be fetched from thread-local storage, but it will fail the sanity check as the owner PID for the slot will be 0, i.e. different than the PID of the process which created the LMDB environment, resulting in "MDB_BAD_RSLOT: Invalid reuse of reader locktable slot". As named creates a fixed number of worker threads and any of them may be the one initiating the transaction, the chances of triggering an error in the above scenario are inversely proportional to the number of worker threads; when BIND is built with --disable-threads, the scenario above will always result in MDB_BAD_RSLOT, as proven by the addzone system test. Changing the order in which NZD environments are created and destroyed upon reload would require severe changes in BIND. Fortunately, there is a workaround: creating NZD environments with the MDB_NOTLS flag, which causes a different model for assigning reader locktable slots to be used: instead of assigning them to threads using thread-local storage, they are assigned to transaction objects. This causes a new reader locktable slot to be created for each transaction, thus preventing multiple environments created by a process for the same database from stamping on each other's data. Please review rt46556 which sets MDB_NOTLS for all relevant mdb_env_open() calls.

# Fri, 24 Nov 2017 08:48:56 -0500 Michał Kępień <michal@isc.org> - Status changed from 'open' to 'review'

# Sun, 26 Nov 2017 17:44:50 -0500 Evan Hunt <Evan_Hunt@isc.org> - Correspondence added [Reply] [Comment]

Download (untitled) / with headers
text/plain 89B

It might be useful to add a short comment to explain the reasoning, otherwise looks fine.

# Sun, 26 Nov 2017 17:44:51 -0500 Evan Hunt <Evan_Hunt@isc.org> - Given to Michał Kępień <michal@isc.org>

# Mon, 27 Nov 2017 03:56:06 -0500 Michał Kępień <michal@isc.org> - Correspondence added [Reply] [Comment]

Download (untitled) / with headers
text/plain 116B

4828. [bug] Do not use thread-local storage for storing LMDB reader locktable slots. [RT #46556] 9.11.3, 9.12.0

# Mon, 27 Nov 2017 03:56:07 -0500 Michał Kępień <michal@isc.org> - Status changed from 'review' to 'resolved'

# Mon, 27 Nov 2017 03:56:07 -0500 Michał Kępień <michal@isc.org> - Version Fixed 9.11.3, 9.12.0 added

Bug #46556 for bind9-public: 'showzone added.example in external failing' on freebsd

This bug tracker is no longer active.