Report information
The Basics
Id:
46468
Status:
resolved
Priority:
Low/Low
Queue:

BugTracker
Version Fixed:
9.9.12, 9.10.7, 9.11.3, 9.12.0
Version Found:
(no value)
Versions Affected:
(no value)
Versions Planned:
(no value)
Priority:
P2 Normal
Severity:
S2 Normal
CVSS Score:
(no value)
CVE ID:
(no value)
Component:
BIND Server
Area:
bug

Dates
Created:Thu, 02 Nov 2017 10:42:53 -0400
Updated:Thu, 09 Nov 2017 09:32:49 -0500
Closed:Thu, 09 Nov 2017 09:32:47 -0500



This bug tracker is no longer active.

Please go to our Gitlab to submit issues (both feature requests and bug reports) for active projects maintained by Internet Systems Consortium (ISC).

Due to security and confidentiality requirements, full access is limited to the primary maintainers.

From: michal@isc.org
Subject: Prevent further races in RFC 5011-related code
Date: Thu, 02 Nov 2017 15:42:53 +0100
To: bind9-public@isc.org
Prevent further races in RFC 5011-related code The updated mkeys system test triggered an assertion failure [1]: 02-Nov-2017 00:27:13.069 zone_timer: managed-keys-zone : enter 02-Nov-2017 00:27:13.069 zone_maintenance: managed-keys-zone : enter 02-Nov-2017 00:27:13.069 zone_dump: managed-keys-zone : enter 02-Nov-2017 00:27:13.069 socket 0x7fef84928540 10.53.0.1#53238: packet received correctly 02-Nov-2017 00:27:13.069 zone_refreshkeys: managed-keys-zone : enter 02-Nov-2017 00:27:13.069 managed-keys-zone: Creating key fetch in zone_refreshkeys() for '.' 02-Nov-2017 00:27:13.069 fetch: ./DNSKEY 02-Nov-2017 00:27:13.069 fctx 0x7fef7801df90(./DNSKEY): create 02-Nov-2017 00:27:13.069 received control channel command 'managed-keys refresh' 02-Nov-2017 00:27:13.069 log_ns_ttl: fctx 0x7fef7801df90: fctx_create: . (in '.'?): 1 16 02-Nov-2017 00:27:13.069 fctx 0x7fef7801df90(./DNSKEY): join 02-Nov-2017 00:27:13.069 fetch 0x7fef84922548 (fctx 0x7fef7801df90(./DNSKEY)): created 02-Nov-2017 00:27:13.069 managed-keys-zone: synchronizing trusted keys 02-Nov-2017 00:27:13.069 zone_settimer: managed-keys-zone : enter 02-Nov-2017 00:27:13.069 zone_settimer: managed-keys-zone : settimer inactive 02-Nov-2017 00:27:13.069 zone_gotwritehandle: managed-keys-zone : enter 02-Nov-2017 00:27:13.101 rbtdb.c:1483: REQUIRE(rbtdb->future_version == ((void *)0)) failed, back trace 02-Nov-2017 00:27:13.101 #0 0x42781d in assertion_failed()+0x4d 02-Nov-2017 00:27:13.101 #1 0x6164ea in isc_assertion_failed()+0xa 02-Nov-2017 00:27:13.101 #2 0x4fd14a in newversion()+0x4ba 02-Nov-2017 00:27:13.101 #3 0x5c5ca0 in sync_keyzone()+0x140 02-Nov-2017 00:27:13.101 #4 0x5c645b in dns_zone_synckeyzone()+0x8b 02-Nov-2017 00:27:13.101 #5 0x43a275 in named_server_mkeys()+0x425 02-Nov-2017 00:27:13.101 #6 0x422bcd in named_control_docommand()+0x92d 02-Nov-2017 00:27:13.101 #7 0x425b3d in control_recvmessage()+0x80d 02-Nov-2017 00:27:13.101 #8 0x6387ed in run()+0x2cd 02-Nov-2017 00:27:13.101 #9 0x7fef90398dc5 in __do_global_dtors_aux_fini_array_entry()+0x7fef8fa8acd5 02-Nov-2017 00:27:13.101 #10 0x7fef900c773d in __do_global_dtors_aux_fini_array_entry()+0x7fef8f7b964d 02-Nov-2017 00:27:13.101 exiting (due to assertion failure) It is caused by zone_refreshkeys() releasing the zone lock too early: 9801 UNLOCK_ZONE(zone); 9802 9803 dns_diff_clear(&diff); 9804 if (ver != NULL) { 9805 dns_rriterator_destroy(&rrit); 9806 dns_db_closeversion(db, &ver, commit); 9807 } When "rndc managed-keys refresh" is called around the time a key refresh fetch is created, threads may be scheduled in a way which causes sync_keyzone() to call dns_db_newversion() before zone_refreshkeys() cleans up the database version it creates earlier. While investigating this issue, I found another one: if receiving the response to the initializing RFC 5011 key refresh query is quickly followed by receiving a control message from "rndc managed-keys refresh" and these two events happen at the same timestamp (with one second resolution), no further RFC 5011 refreshes will occur until named is restarted. This is caused by the same condition that caused RT #46055 because it checks whether the add hold-down timer for a key is lower than the current timestamp instead of checking whether it is equal or lower. This results in the DNS_ZONEFLG_REFRESHING flag being permanently set. As a side note, the code is a bit inconsistent about when a key is considered to be trusted, e.g. keyfetch_done() marks a key as trusted when its add hold-down timer is equal or lower than current time while mkey_dumpzone() claims a key with an add hold-down timer equal or greater than current time is pending trust. RFC 5011 does not specify which interpretation is correct as it uses the term "timer expiry" to define the moment a key should be trusted. I am inclined to consistently use the interpretation used by keyfetch_done(), i.e. a key with an add hold-down timer equal to current time should be considered trusted. [1] https://jenkins.isc.org/view/BIND/job/bind9-master-centos7-amd64-1/334/
Branch rt46468 fixes both issues mentioned above and disassociates clearing the DNS_ZONEFLG_REFRESHING flag from fetch creation failures. We already missed two scenarios which cause a "deadlock" in RFC 5011 code, so I hope that with an additional countermeasure in place we will no more have to trust that the key processing loop in zone_refreshkeys() is definitely free of any further bugs. This change may cause some busy looping [1] in case there are any other similar errors out there, but that is arguably better (and easier to detect) than stopping RFC 5011 cold. Please review. I could not come up with a reliable way to trigger any of the issues fixed by rt46468 and thus no system tests were added to it. [1] Caused by zone_refreshkeys() not resetting the key refresh timer: 02-Nov-2017 15:17:03.995 zone_maintenance: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_refreshkeys: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_settimer: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_timer: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_maintenance: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_refreshkeys: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_settimer: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_timer: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_maintenance: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_refreshkeys: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_settimer: managed-keys-zone : enter 02-Nov-2017 15:17:03.995 zone_timer: managed-keys-zone : enter
Subject: Re: [ISC-Bugs #46468] Prevent further races in RFC 5011-related code
CC:
From: "Evan Hunt" <each@isc.org>
To: "Michał Kępień via RT" <bind9-public@isc.org>
Date: Thu, 2 Nov 2017 19:51:34 +0000
> Please review. The changes all look good to me.
Pushed one more commit to address the inconsistency I mentioned above. Please review.
From: "Evan Hunt" <each@isc.org>
CC:
To: "Michał Kępień via RT" <bind9-public@isc.org>
Date: Wed, 8 Nov 2017 17:43:38 +0000
Subject: Re: [ISC-Bugs #46468] Prevent further races in RFC 5011-related code
> Pushed one more commit to address the inconsistency I mentioned above. > Please review. Looks good. (Nice catch updating the perl script in contrib, btw; I would definitely have missed that.) Okay to commit.
4812. [bug] Minor improvements to stability and consistency of code handling managed keys. [RT #46468] 9.9.12, 9.10.7, 9.11.3, 9.12.0