From: | michal@isc.org |
Subject: | Prevent further races in RFC 5011-related code |
Date: | Thu, 02 Nov 2017 15:42:53 +0100 |
To: | bind9-public@isc.org |
Prevent further races in RFC 5011-related code
The updated mkeys system test triggered an assertion failure [1]:
02-Nov-2017 00:27:13.069 zone_timer: managed-keys-zone : enter
02-Nov-2017 00:27:13.069 zone_maintenance: managed-keys-zone : enter
02-Nov-2017 00:27:13.069 zone_dump: managed-keys-zone : enter
02-Nov-2017 00:27:13.069 socket 0x7fef84928540 10.53.0.1#53238: packet received correctly
02-Nov-2017 00:27:13.069 zone_refreshkeys: managed-keys-zone : enter
02-Nov-2017 00:27:13.069 managed-keys-zone: Creating key fetch in zone_refreshkeys() for '.'
02-Nov-2017 00:27:13.069 fetch: ./DNSKEY
02-Nov-2017 00:27:13.069 fctx 0x7fef7801df90(./DNSKEY): create
02-Nov-2017 00:27:13.069 received control channel command 'managed-keys refresh'
02-Nov-2017 00:27:13.069 log_ns_ttl: fctx 0x7fef7801df90: fctx_create: . (in '.'?): 1 16
02-Nov-2017 00:27:13.069 fctx 0x7fef7801df90(./DNSKEY): join
02-Nov-2017 00:27:13.069 fetch 0x7fef84922548 (fctx 0x7fef7801df90(./DNSKEY)): created
02-Nov-2017 00:27:13.069 managed-keys-zone: synchronizing trusted keys
02-Nov-2017 00:27:13.069 zone_settimer: managed-keys-zone : enter
02-Nov-2017 00:27:13.069 zone_settimer: managed-keys-zone : settimer inactive
02-Nov-2017 00:27:13.069 zone_gotwritehandle: managed-keys-zone : enter
02-Nov-2017 00:27:13.101 rbtdb.c:1483: REQUIRE(rbtdb->future_version == ((void *)0)) failed, back trace
02-Nov-2017 00:27:13.101 #0 0x42781d in assertion_failed()+0x4d
02-Nov-2017 00:27:13.101 #1 0x6164ea in isc_assertion_failed()+0xa
02-Nov-2017 00:27:13.101 #2 0x4fd14a in newversion()+0x4ba
02-Nov-2017 00:27:13.101 #3 0x5c5ca0 in sync_keyzone()+0x140
02-Nov-2017 00:27:13.101 #4 0x5c645b in dns_zone_synckeyzone()+0x8b
02-Nov-2017 00:27:13.101 #5 0x43a275 in named_server_mkeys()+0x425
02-Nov-2017 00:27:13.101 #6 0x422bcd in named_control_docommand()+0x92d
02-Nov-2017 00:27:13.101 #7 0x425b3d in control_recvmessage()+0x80d
02-Nov-2017 00:27:13.101 #8 0x6387ed in run()+0x2cd
02-Nov-2017 00:27:13.101 #9 0x7fef90398dc5 in __do_global_dtors_aux_fini_array_entry()+0x7fef8fa8acd5
02-Nov-2017 00:27:13.101 #10 0x7fef900c773d in __do_global_dtors_aux_fini_array_entry()+0x7fef8f7b964d
02-Nov-2017 00:27:13.101 exiting (due to assertion failure)
It is caused by zone_refreshkeys() releasing the zone lock too early:
9801 UNLOCK_ZONE(zone);
9802
9803 dns_diff_clear(&diff);
9804 if (ver != NULL) {
9805 dns_rriterator_destroy(&rrit);
9806 dns_db_closeversion(db, &ver, commit);
9807 }
When "rndc managed-keys refresh" is called around the time a key refresh
fetch is created, threads may be scheduled in a way which causes
sync_keyzone() to call dns_db_newversion() before zone_refreshkeys()
cleans up the database version it creates earlier.
While investigating this issue, I found another one: if receiving the
response to the initializing RFC 5011 key refresh query is quickly
followed by receiving a control message from "rndc managed-keys refresh"
and these two events happen at the same timestamp (with one second
resolution), no further RFC 5011 refreshes will occur until named is
restarted. This is caused by the same condition that caused RT #46055
because it checks whether the add hold-down timer for a key is lower
than the current timestamp instead of checking whether it is equal or
lower. This results in the DNS_ZONEFLG_REFRESHING flag being
permanently set.
As a side note, the code is a bit inconsistent about when a key is
considered to be trusted, e.g. keyfetch_done() marks a key as trusted
when its add hold-down timer is equal or lower than current time while
mkey_dumpzone() claims a key with an add hold-down timer equal or
greater than current time is pending trust. RFC 5011 does not specify
which interpretation is correct as it uses the term "timer expiry" to
define the moment a key should be trusted. I am inclined to
consistently use the interpretation used by keyfetch_done(), i.e. a key
with an add hold-down timer equal to current time should be considered
trusted.
[1] https://jenkins.isc.org/view/BIND/job/bind9-master-centos7-amd64-1/334/