MIME-Version: 1.0 In-Reply-To: <20170526103815.GA22789@jurassic> X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Disposition: inline X-RT-Interface: Web References: <20170526103815.GA22789@jurassic> Content-Type: text/plain; charset="utf-8" Message-ID: Content-Transfer-Encoding: binary X-RT-Original-Encoding: utf-8 RT-Send-CC: X-RT-Encrypt: 0 X-RT-Sign: 0 Content-Length: 2669 I believe the mkeys system test is subject to subtle timing issues, namely we do some grepping on the managed-keys zone file without waiting until it is actually written to disk after "rndc managed-keys sync" is invoked. I was able to intermittently reproduce this issue only on a Fedora 25 VM. I then modified bin/tests/system/mkeys/tests.sh to print current time with nanosecond precision just after the problematic grep call (`grep "trust pending" ns2/managed-keys.bind` in the "restore untrusted standby key, revoke original key" test). Here is an example of a failed test being run with that modification: I: check for signed record (1) I: check positive validation with valid trust anchor (2) I: check positive validation using delv (3) I: check for failed validation due to wrong key in managed-keys (4) I: check new trust anchor can be added (5) I: ns2 refreshing managed keys for '_default' I: check new trust anchor can't be added with bad initial key (6) I: ns3 refreshing managed keys for '_default' I: remove untrusted standby key, check timer restarts (7) I: ns2 refreshing managed keys for '_default' I: restore untrusted standby key, revoke original key (8) I: ns2 refreshing managed keys for '_default' 13:05:56.664647628 I:failed Meanwhile, named.run for that specific test contains: 13:05:56.656 received control channel command 'managed-keys sync' 13:05:56.656 zone_dump: managed-keys-zone : enter 13:05:56.656 zone_gotwritehandle: managed-keys-zone : enter ... 13:05:56.673 dump_done: managed-keys-zone : enter In other words, we run grep on what we believe is the new managed-keys zone file, compare that to the output of a similar grep command run on the old managed-keys zone file and expect the two values to be different. What actually happens, though, is that both grep commands are run on the same file, i.e. the old managed-keys zone file (I confirmed this by echoing $t1 and $t2 in the failing test). A quick way to reproduce this issue reliably (at least on the machines I am using) is to remove the "sleep 1" line after the "managed-keys sync" rndc call in the "remove untrusted standby key, check timer restarts" test. To fix this issue and also achieve some consistency in the mkeys system test, I modified it so that we always sleep for 1 second after each "managed-keys refresh" and "managed-keys sync" rndc call. To avoid confusion, I also removed all "managed-keys sync" calls whose results (i.e. the managed-keys zone file) are not used for determining the result of the test. Fixes described above are in the rt45293 branch. Please review.