Skip Menu |
Report information
The Basics
Id: 44790
Status: resolved
Priority: 50/50
Queue: bind9-public

People
Owner: Nobody in particular
Requestors: Vicky Risk <vicky@isc.org>
Cc:
AdminCc:

Bug Information
Version Fixed: 9.12.0
Version Found: (no value)
Versions Affected: (no value)
Versions Planned: 9.10.6-S2, 9.12
Priority: P1 High
Severity: S1 High
CVSS Score: (no value)
CVE ID: (no value)
Component: (no value)
Area: feature

Dates
Created:Wed, 01 Mar 2017 15:37:47 -0500
Updated:Fri, 13 Oct 2017 12:41:25 -0400
Closed:Fri, 13 Oct 2017 12:41:25 -0400



This bug tracker is no longer active.

Please go to our Gitlab to submit issues (both feature requests and bug reports) for active projects maintained by Internet Systems Consortium (ISC).

Due to security and confidentiality requirements, full access is limited to the primary maintainers.

Subject: TTL Stretching [PATCH]
Download (untitled) / with headers
text/plain 2.5KiB
Continue answering queries with cached expired answers, if the authority is not answering, after first querying the authority. The goal is to allow resolvers to continue to respond even when the authority for the data might be temporarily unresponsive due to an attack. We want a (configurable) timer for how long to wait for the authority to respond before serving stale data, and a (configurable) timer on the amount of time after it is expired, that a record may still be served. The expectation is that this would be enabled system-wide. Not sure whether this should default to on, or what the default settings for the timers should be. The pending IETF DRAFT "Serving Stale Data to Improve DNS Resiliency draft-tale-dnsop-serve-stale-00" is a start at describing how this should work. The attached contributed patch from Akamai may or may not be useful. email excerpted below is attached, with the current draft of the IETF paper. "Here it is, the long awaited patch! There are two attachments, the first being the patch and the second the current version of the Internet-Draft that I'm waiting to submit after Warren is done with the editing pen. Some things to note about the patch: * Per the comment in the draft about not evicting CNAMEs in the cache when other data arrives, this can result in unexpected behaviour once everything goes stale and the CNAME comes back into play after new authoritative data had changed the zone. We had an incident related to this. This has not been addressed in the patch; if I had, I was leaning in the direction of checking for an existing CNAME conflict when adding new data and evicting the old data. * This does not handle using stale glue really, which is a shame. I believe it should, but I just didn't get into messing around with the adb. Personally I think if you have a stale delegation for example.com you should still be able to use it to resolve names. * There's some work in there related to reloading the dump file, which I realize was meant only for testing and not a production feature even before this came along. We had a thought that this would also improve generalized resiliency to preserve data across restarts, but since the dump load doesn't have a provision for loading negative answers I didn't finish that. The timestamps in the dump file are still written to reflect stale age though, which could be really surprising for someone looking at it and seeing much longer TTLs than they expect. Sorry again that this took so long, but I hope that it is useful for you."
Subject: Serve Stale patch.pdf
Download Serve Stale patch.pdf
application/pdf 66.1KiB

Message body not shown because it is not plain text.

Subject: serve-stale.diff
Download serve-stale.diff
application/octet-stream 48.9KiB

Message body not shown because it is not plain text.

On Tue Jul 04 03:40:09 2017, stephen wrote: > Reviewed the documentation and tests for commit > 8ce30b9b86d0fd11d26ba2fb0037ea5c29832eef. (Note: the code has NOT been > reviewed.) > > Documentation > === > There is nothing in the ARM about "rndc serve-stale" command. In > particular, what does "serve-stale reset" command do? The arm includes the rndc man page. That said I've added a reference to it. > In the ARM, the documentation for stale-answer-ttl should note that > max-stale-ttl has to also be set to enable the serving of stale > answers. added. > max-stale-ttl does not seem the right name for this option: to my > mind, it suggests that the value is the maximum value of a stale > record's TTL. max-stale-retention or max-stale-retain might be a > better name. It's consistent with max-cache-ttl, max-ncache-ttl.
On Wed Jun 21 07:47:30 2017, muks wrote: > * Why is stale-answer-ttl clamped to a minimum of 1? Should 0 not be > allowed? TTL=0 seems more correct when serving stale answers as they > will not be cached, but maybe a non-zero TTL would be required for > some cases. (This was pointed out to DCL too as a a review comment on > the draft.) When servers are daisy chained we want some caching to occur and when the answer is direct to client 0/1 makes no difference.
On Wed Jun 21 07:47:30 2017, muks wrote: > * If IN class string is only passed to rndc ttl-stretching, it doesn't > look like it skips non-matching classes addressed
Download (untitled) / with headers
text/plain 2.7KiB
Reviewed the tests and documentation for commit addc4c6e67f0a4a05bada67d4a6d917657fecc9c Documentation --- The documentation for stale-answer-ttl states that for stale answers to be returned, max-stale-ttl must be set to a non-zero value and they must not have been disabled by rndc". However, from the named.conf files in the test it appears that stale-answer-enable also needs to be set. In addition, the documentation does not mention stale-answer-enable - the option appears to be called serve-stale-enable there. Tests (general comments) --- I note that the records returned by ans.pl have a TTL of 1 second, and the script waits for 1 second for them to become stale. To avoid any possible timing problems when running the tests, I suggest that the sleep intervals be lengthened to 2 seconds. It would be helpful if the messages in the test script were more informative. For example, "check rndc serve-stale status" messages could be expanded to include whether it was expected that the status is enabled or disabled. When the test of invalid rndc command is made, preceding the messages with a message stating that invalid commands are being tested would have been helpful. I was perplexed at first by the messages "check rndc serve-stale" and "check rndc serve-stale unknown", thinking they were valid commands. When checking the test code, it took a little time to realise that resolver ns3 was being tested and not ns1. (I was puzzled by the fact that the last loaded configuration for ns1 contained a max-stale-ttl of 7200, yet a value of 604800 was being returned by rndc.) Was there a reason for this - why not just use a new configuration and reload (as done earlier in the script)? Tests (specific) --- A summary of the test coverage against requirements is given in the cross reference of the tests against the requirements in the internal wiki. The main issues are (numbers are the requirement number): 1.1 No check that if the feature is explicitly disabled in the configuration file (as opposed to not being configured), it is disabled in practice. 2.1 There is no explicit test that an upstream query is being made before a stale record is served. (This could be tested by extending "ans.pl" to return a count of queries made and testing that before and after requesting stale records.) 2.4 All records in the test have a TTL of 1. There is no check that the TTL of the stale record returned is set (capped?) by the stale-answer-ttl value. 2.6 There is no check that the max-stale-ttl value caps the amount of time that a stale record can be served. 2.8 No test that a stale positive value is updated by a NODATA or NXDOMAIN. There are no tests for the requirements in section 3 (interaction with other features), but these are tests that can be implemented between the alpha and beta.
4700. [func] Serving of stale answers is now supported. This allows named to provide stale cached answers when the authoritative server is under attack. See max-stale-ttl, stale-answer-enable, stale-answer-ttl. [RT #44790]