Subject: | TTL Stretching [PATCH] |
Continue answering queries with cached expired answers, if the authority is not answering, after first querying the authority.
The goal is to allow resolvers to continue to respond even when the authority for the data might be temporarily unresponsive due to an attack.
We want a (configurable) timer for how long to wait for the authority to respond before serving stale data, and a (configurable) timer on the amount of time after it is expired, that a record may still be served.
The expectation is that this would be enabled system-wide. Not sure whether this should default to on, or what the default settings for the timers should be.
The pending IETF DRAFT "Serving Stale Data to Improve DNS Resiliency draft-tale-dnsop-serve-stale-00" is a start at describing how this should work.
The attached contributed patch from Akamai may or may not be useful.
email excerpted below is attached, with the current draft of the IETF paper.
"Here it is, the long awaited patch! There are two attachments, the
first being the patch and the second the current version of the
Internet-Draft that I'm waiting to submit after Warren is done with
the editing pen.
Some things to note about the patch:
* Per the comment in the draft about not evicting CNAMEs in the cache
when other data arrives, this can result in unexpected behaviour
once everything goes stale and the CNAME comes back into play after
new authoritative data had changed the zone. We had an incident
related to this. This has not been addressed in the patch; if I
had, I was leaning in the direction of checking for an existing
CNAME conflict when adding new data and evicting the old data.
* This does not handle using stale glue really, which is a shame. I
believe it should, but I just didn't get into messing around with
the adb. Personally I think if you have a stale delegation for
example.com you should still be able to use it to resolve names.
* There's some work in there related to reloading the dump file, which
I realize was meant only for testing and not a production feature
even before this came along. We had a thought that this would also
improve generalized resiliency to preserve data across restarts, but
since the dump load doesn't have a provision for loading negative
answers I didn't finish that. The timestamps in the dump file are
still written to reflect stale age though, which could be really
surprising for someone looking at it and seeing much longer TTLs
than they expect.
Sorry again that this took so long, but I hope that it is useful for you."
Subject: | Serve Stale patch.pdf |
Message body not shown because it is not plain text.
Subject: | serve-stale.diff |
Message body not shown because it is not plain text.