Report information
The Basics
Id:
45293
Status:
resolved
Priority:
Medium/Medium
Queue:

People
Requestors:
Cc:
AdminCc:

BugTracker
Version Fixed:
(no value)
Version Found:
(no value)
Versions Affected:
(no value)
Versions Planned:
(no value)
Priority:
P1 High
Severity:
S1 High
CVSS Score:
(no value)
CVE ID:
(no value)
Component:
(no value)
Area:
(no value)

Dates
Created:Fri, 26 May 2017 06:38:27 -0400
Updated:Mon, 30 Oct 2017 05:02:46 -0400
Closed:Mon, 30 Oct 2017 04:52:43 -0400



This bug tracker is no longer active.

Please go to our Gitlab to submit issues (both feature requests and bug reports) for active projects maintained by Internet Systems Consortium (ISC).

Due to security and confidentiality requirements, full access is limited to the primary maintainers.

Subject: mkeys system test fails
Date: Fri, 26 May 2017 16:08:15 +0530
To: bind9-bugs@isc.org
From: "Mukund Sivaraman" <muks@isc.org>
mkeys system test on master consistently fails for me: [muks@jurassic system]$ sh ./run.sh mkeys S:mkeys:Fri May 26 16:05:27 IST 2017 T:mkeys:1:A A:System test mkeys I: check for signed record (1) I: check positive validation with valid trust anchor (2) I: check positive validation using delv (3) I: check for failed validation due to wrong key in managed-keys (4) I: check new trust anchor can be added (5) I: ns2 refreshing managed keys for '_default' I: check new trust anchor can't be added with bad initial key (6) I: ns3 refreshing managed keys for '_default' I: remove untrusted standby key, check timer restarts (7) I: ns2 refreshing managed keys for '_default' I: restore untrusted standby key, revoke original key (8) I: ns2 refreshing managed keys for '_default' I:failed I: refresh managed-keys, ensure same result (9) I: ns2 refreshing managed keys for '_default' I: restore revoked key, ensure same result (10) I: ns2 refreshing managed keys for '_default' I: reinitialize trust anchors I: check that standby key is now trusted (11) I: revoke original key, add new standby (12) I: ns2 refreshing managed keys for '_default' I: revoke standby before it is trusted (13) I: ns2 refreshing managed keys for '_default' I: ns2 refreshing managed keys for '_default' I: wait 30 seconds for key add/remove holddowns to expire (14) I: ns2 refreshing managed keys for '_default' I: revoke all keys, confirm roll to insecure (15) I: ns2 refreshing managed keys for '_default' I: check for insecure response (16) I: reset the root server I: reinitialize trust anchors I: check positive validation (17) I: revoke key with bad signature, check revocation is ignored (18) I: ns1 zone reload queued I: ns2 refreshing managed keys for '_default' I: check validation fails with bad DNSKEY rrset (19) I: restore DNSKEY rrset, check validation succeeds again (20) I: ns1 zone reload queued I: ns2 refreshing managed keys for '_default' I: reset the root server with no keys, check for minimal update (21) I: ns2 refreshing managed keys for '_default' I: reset the root server with no signatures, check for minimal update (22) I: ns2 refreshing managed keys for '_default' I: restore root server, check validation succeeds again (23) I: ns1 zone reload successful I: ns2 refreshing managed keys for '_default' I: check that trust-anchor-telemetry queries are logged (24) I: check that trust-anchor-telemetry queries are received (25) I:exit status: 1 R:FAIL E:mkeys:Fri May 26 16:07:12 IST 2017 [muks@jurassic system]$ Mukund
this also fails in jenkins on freeBSD (for the past 42 builds, starting with the build at d73c32c17fa6a59701bfdbfd78055f6d8c7c3d36. the problem did not occur at 4c97cb13bdcc1f11ee75ae84068e2a8e59200430) https://jenkins.isc.org/view/BIND/job/bind9-master-freebsd11-64/lastCompletedBuild/testReport/
I believe the mkeys system test is subject to subtle timing issues, namely we do some grepping on the managed-keys zone file without waiting until it is actually written to disk after "rndc managed-keys sync" is invoked. I was able to intermittently reproduce this issue only on a Fedora 25 VM. I then modified bin/tests/system/mkeys/tests.sh to print current time with nanosecond precision just after the problematic grep call (`grep "trust pending" ns2/managed-keys.bind` in the "restore untrusted standby key, revoke original key" test). Here is an example of a failed test being run with that modification: I: check for signed record (1) I: check positive validation with valid trust anchor (2) I: check positive validation using delv (3) I: check for failed validation due to wrong key in managed-keys (4) I: check new trust anchor can be added (5) I: ns2 refreshing managed keys for '_default' I: check new trust anchor can't be added with bad initial key (6) I: ns3 refreshing managed keys for '_default' I: remove untrusted standby key, check timer restarts (7) I: ns2 refreshing managed keys for '_default' I: restore untrusted standby key, revoke original key (8) I: ns2 refreshing managed keys for '_default' 13:05:56.664647628 I:failed Meanwhile, named.run for that specific test contains: 13:05:56.656 received control channel command 'managed-keys sync' 13:05:56.656 zone_dump: managed-keys-zone : enter 13:05:56.656 zone_gotwritehandle: managed-keys-zone : enter ... 13:05:56.673 dump_done: managed-keys-zone : enter In other words, we run grep on what we believe is the new managed-keys zone file, compare that to the output of a similar grep command run on the old managed-keys zone file and expect the two values to be different. What actually happens, though, is that both grep commands are run on the same file, i.e. the old managed-keys zone file (I confirmed this by echoing $t1 and $t2 in the failing test). A quick way to reproduce this issue reliably (at least on the machines I am using) is to remove the "sleep 1" line after the "managed-keys sync" rndc call in the "remove untrusted standby key, check timer restarts" test. To fix this issue and also achieve some consistency in the mkeys system test, I modified it so that we always sleep for 1 second after each "managed-keys refresh" and "managed-keys sync" rndc call. To avoid confusion, I also removed all "managed-keys sync" calls whose results (i.e. the managed-keys zone file) are not used for determining the result of the test. Fixes described above are in the rt45293 branch. Please review.
Proposed fix for processing cached DNSKEY RRsets while performing an RFC 5011 key refresh has been implemented in RT #46251. Pushed an extensive set of rebased fixes and cleanups for the mkeys system test to branch rt45293b. Updated test seems to pass consistently on Arch Linux, Fedora 25, FreeBSD 11.0 and OpenBSD 6.1. Note that rt45293b is based on rt46251 because even with the fixes and cleanups from rt45293b applied, the mkeys system test will still not pass consistently without rt46251 being merged beforehand. As a side note, these cleanups should result in a significant drop in time required to run the mkeys system test. On my machine that time went down from about 1m50s to about 50s. Please review.
These changes all look good, but I'm concerned about conflict with rt46077b, which also touched the mkeys test substantially. I think it might be best to delay merging this until that one's done. rt46251 can go in right away though.
Pushed rt45293b rebased on top of current master as rt45293c. There are two functional changes compared to rt45293b: - Sleep for 21 seconds instead of 20 seconds when waiting for timer expiration. See the relevant comment added by commit 3d2abd0f0d for reasoning. - Place sleeps before dnssec-signzone is called instead of before "rndc reload ." is called (they are pointless otherwise). Ready for review.
This looks good. I reworded a couple of comments that weren't quite clear. Also, as a suggestion, I moved the "nextpart" function from sfcache to conf.sh.in so it could be used in both sfcache and mkeys, and then we don't have to have two different functions for skipping lines in named.run. If you like the change, keep it; if not, feel free to revert - it's okay to commit either way.
Rebased on top of master with RT #46267 merged, pushed it as rt45293d. Fixed one more potential false negative by disabling trust anchor telemetry on ns3 (see commit 856e775f1b for details).
4802. [test] Refactor mkeys system test to make it quicker and more reliable. [RT #45293] 9.12.0