> > I also noticed a glitch in BADCOOKIE handling which was already present
> > in the code before refactoring:
>
> Also a good catch, but I'm not sure about this being present before
> refactoring? I don't think there was a queryopts variable then, so
> query->options was the only place where we checked for DNS_FETCHOPT_TCP.
I am not sure we are on the same page here, so please allow me to
reiterate just in case: the bug is not about where DNS_FETCHOPT_TCP is
_checked_, but rather where it is _set_. Before refactoring, a local
variable called "options" was used for the same purpose for which
rctx->queryopts is used after refactoring (line numbers taken from
current master):
7582 static void
7583 resquery_response(isc_task_t *task, isc_event_t *event) {
...
7587 isc_boolean_t keep_trying, get_nameservers, resend, nextitem;
7597 unsigned int options;
...
7617 options = query->options;
...
7634 resend = ISC_FALSE;
...
8178 } else if (message->rcode == dns_rcode_badcookie &&
8179 message->cc_ok) {
8180 /*
8181 * We have recorded the new cookie.
8182 */
8183 if (BADCOOKIE(query->addrinfo))
8184 query->options |= DNS_FETCHOPT_TCP;
8185 query->addrinfo->flags |= FCTX_ADDRINFO_BADCOOKIE;
8186 resend = ISC_TRUE;
8187 } else {
...
8194 }
...
8200 goto done;
...
8451 done:
...
8553 } else if (resend) {
...
8561 result = fctx_query(fctx, addrinfo, options);
Thus, I still believe the bug was present in the code before, i.e. that
it was not introduced by refactoring. (Perhaps this is worth fixing in
a separate ticket?)
> For now I'm taking the approach of referencing and updating rctx->queryopts
> in all the rctx_* functions, but continuing to use query->options elsewhere.
>
> Possibly the right thing is simply to use rctx->query->options in all
> cases instead of rctx->queryopts, though... Hm.
The case I was worried about is that we would set a bit in
rctx->queryopts and then later check for the same bit in
rctx->queryopts, with the actual intention of checking whether that bit
was set for the original query (i.e. in rctx->query->options). I could
not find any code paths which do something like that (whenever
rctx->queryopts is modified, rctx_done() is called almost immediately
afterwards), but "enforcing" the "policy" I suggested ("always use
rctx->query->options for checking bits, always use rctx->queryopts for
setting bits") might prevent such code paths from being erroneously
introduced in the future.
While we are at it, how about renaming "queryopts" to "requeryopts" or
something similar? It would make its purpose more explicit.
> > Finally, could you please elaborate a bit on the system test changes in
> > f9cd680492? I understand it fixes some problem with the dnssec system
> > test, but it is not immediately obvious to me from looking at the fix
> > what that problem was in the first place, plus a whole check is removed
> > by that commit.
>
> The acache was removed by Mukund's recent performance work; there's no
> point testing different behavior with and without it. As far as I can
> recall that's the only thing I was fixing in that particular commit. Sorry
> the message wasn't more detailed.
Ah, okay, thanks for the explanation.
> > o The choice of rctx fields which are explicitly initialized in
> > rctx_initanswer() seems a bit arbitrary. The memory chunk used by
> > rctx is zeroed by a memset() call inside rctx_respinit(), but
> > rctx_initanswer() still sets some pointer fields to NULL while e.g.
> > not setting negative_response to ISC_FALSE explicitly. One possible
> > strategy here is to only explicitly initialize enums (for clarity as
> > to what the default value is) and skip the rest.
>
> I think my reasoning was that rctx_respinit() would only ever be called
> once but rctx_initanswer() might be called more than once, and things
> that had been set would need to be cleared. (Though, I don't currently
> see a scenario where that happens.)
Oh, okay, in that case, I would suggest to do one of the following:
- rename rctx_initanswer() to rctx_resetanswer() so that what you
wrote above is immediately clear and put a comment somewhere that
such a design is intended,
- since there is currently no scenario where rctx_initanswer() is
called more than once for the same struct respctx, keeping things
simple by trimming the explicit assignments down might be the
simplest solution.
> In any case, the fields that are set in rctx_initanswer() are the ones that
> positively *must* be initialized at the beginning of rctx_answer() and
> rctx_noaanswer(), and I think I'd rather leave them there even if they're
> not strictly necessary.
Sure, these are harmless right now. I am just worried that if the
intended design of rctx_initanswer() allows it to be reused for the same
struct respctx, selective assignments inside it may cause obscure bugs
and/or unnecessary debugging efforts in the future [1].
> At last year's all-hands meeting we discussed the risks of this coding
> style -- it was the direct cause of a well-publicised Apple SSL bug -- and
> several of us felt that we should start moving away from it. So, in my
> refactoring work, both here and in query.c, I've been trying to remember
> to put brackets around single-line if statements in code that I touch.
>
> My feeling is that the code style is evolving on this point; we *can*
> leave single-line if statements unbracketed, but we don't have to
> and in this work I'm choosing not to.
Fair enough, thanks for explaining the background. I am okay with both
styles and I was just pointing out the inconsistency.
> > o Is the "&& !rctx->chaining" part of the logical condition on line
> > 7225 necessary?
>
> Another really good catch. That mistake has been there since 2007.
> Fixed. (It's harmless, but we can also fix it in the non-refactored code
> if you think it's worth bothering. I suspect it gets optimized out by any
> decent compiler.)
I would not bother. As you said, it is harmless.
> > o Roughly a half of the functions introduced by refactoring
> > (rctx_initanswer(), rctx_scananswer(), rctx_any(),
> > rctx_finalanswer(), rctx_cname(), rctx_dname(), rctx_answer(),
> > rctx_respinit(), rctx_nextserver(), rctx_chaseds(), rctx_done(),
> > rctx_edns(), rctx_parse(), rctx_delegation()) have no associated
> > comments which would describe their purpose. Even a couple of words
> > for each of them would already be something.
>
> Yes, that's in my plans but hasn't been done yet. I plan to rearrange the
> functions into a more sensible order and write a long comment explaining
> how they relate to each other, too, like I did in query.c.
Ah, great, in that case, please carry on!
> Please review all the changes I've made in response to your other review
> comments. If they all look good, then I'll do the rearranging and
> documentation steps afterward, without changing any code.
I put down a few more suggestions below.
> > - Indefinite articles in the comment block starting with "A
> > negative response..." (line 7723) could use some tweaking.
> > (This issue was also present in the code before refactoring.)
>
> I often correct Mark's declensional quirk in printed documentation but
> I generally let him have his way in code comments. It's not just you,
> though, I find it weird too. :)
Got it :)
Some follow-up comments (line numbers taken from current head of
rt45362, i.e. commit bb40559303):
- I like defaulting to returning ISC_R_COMPLETE in rctx_parse(), but
given that, could we please reorganize the ISC_R_UNEXPECTEDEND
branch so that it looks like this:
case ISC_R_UNEXPECTEDEND:
if (fctx->rmessage->question_ok &&
(fctx->rmessage->flags & DNS_MESSAGEFLAG_TC) != 0 &&
(rctx->queryopts & DNS_FETCHOPT_TCP) == 0)
{
/*
* We defer retrying via TCP for a bit so we can
* check out this message further.
*/
rctx->truncated = ISC_TRUE;
return (ISC_R_SUCCESS);
}
/*
* Either the message ended prematurely,
...
rctx_done(rctx, result);
break;
Right now, the break statement is inside the conditional branch and
the case block ends with a return. IMHO it is legible, but
confusing.
- Consider this function:
7187 static isc_result_t
7188 rctx_scanauth(respctx_t *rctx) {
7189 fetchctx_t *fctx = rctx->fctx;
7190 isc_boolean_t done = ISC_FALSE;
7191 isc_result_t result;
7192
7193 result = dns_message_firstname(fctx->rmessage, DNS_SECTION_AUTHORITY);
7194 while (!done && result == ISC_R_SUCCESS) {
...
7247 result = dns_message_nextname(fctx->rmessage,
7248 DNS_SECTION_AUTHORITY);
7249 }
7250
7251 if (result == ISC_R_NOMORE) {
7252 result = ISC_R_SUCCESS;
7253 }
7254
7255 return (result);
7256 }
Given that dns_message_firstname() and dns_message_nextname() can
only return ISC_R_NOMORE or ISC_R_SUCCESS and the result variable is
not assigned to anything else than the return value of these two
functions, lines 7251-7255 can simply be replaced with:
return (ISC_R_SUCCESS);
- I believe you did not address this review comment, either in code or
in your reply:
> - Commit 4c36271b11 removed an "XXXRTH log." comment from the code
> branch handling responses from broken servers, but there is
> still no logging code in that branch.
- rctx->glue_in_answer is now a boolean, so:
- if ((rctx->glue_in_answer) != 0 &&
+ if (rctx->glue_in_answer &&
- Misaligned continuation:
8244 inc_stats(fctx->res,
8245 dns_resstatscounter_edns0fail);
- Closing parenthesis missing in comment:
6773 dns_rdatatype_t type; /* type being sought (set to
6774 * ANY if qtype was SIG or RRSIG */
[1] Cf. e.g. the last paragraph of this bug report:
https://bugs.isc.org/Ticket/Display.html?id=45547#txn-483715