Report information
The Basics
Id:
44509
Status:
resolved
Priority:
Medium/Medium
Queue:

People
Owner:
Nobody in particular
Cc:
AdminCc:

BugTracker
Version Fixed:
9.9.10, 9.9.10-S1, 9.10.5, 9.10.5-S1, 9.11.1, 9.12.0
Version Found:
(no value)
Versions Affected:
(no value)
Versions Planned:
(no value)
Priority:
(no value)
Severity:
(no value)
CVSS Score:
(no value)
CVE ID:
(no value)
Component:
(no value)
Area:
(no value)

Dates
Created:Mon, 23 Jan 2017 23:13:12 -0500
Updated:Wed, 09 Aug 2017 22:23:31 -0400
Closed:Tue, 24 Jan 2017 01:51:57 -0500



This bug tracker is no longer active.

Please go to our Gitlab to submit issues (both feature requests and bug reports) for active projects maintained by Internet Systems Consortium (ISC).

Due to security and confidentiality requirements, full access is limited to the primary maintainers.

Subject: Fwd: mdig and memory checking
Date: Tue, 24 Jan 2017 15:12:56 +1100
To: bind9-bugs@isc.org
From: "Mark Andrews" <marka@isc.org>

CC: dot@dotat.at
Subject: support for large batch jobs with mdig
Date: Mon, 23 Jan 2017 16:32:25 +0000
To: bind-workers@isc.org
From: "Tony Finch" <dot@dotat.at>
Marco Davids suggested on Twitter that I should try using mdig for bulk lookups. I have previously used adns for this (it can handily manage 20kqps single threaded), and I have briefly tried out zdns (which isn't as efficient). I tried feeding it the Alexa top 1 million list: $ sed '1d;s|^[0-9]*,||;s|/.*||' top-1m.csv | time mdig +noall +answer -f - A @::1 and it failed: mdig: dns_request_createvia4 failed with quota reached Looking at the code it tries to send all the queries at once, which is obviously not going to work for big jobs! (In fact mdig sets a dispatch quota of 100 which restricts the maximum job size.) I hacked around a bit to try adding a concurrency limit, along the lines I have previously done for adns - see the patch below. This seems to work reasonably OK, e.g. $ sed '1d;s|^[0-9]*,||;s|/.*||' ~/work/dns/adns/top-1m.csv | head -100000 | time ./mdig +continue +noall -C 1000 -f - A @::1 9.74user 0.95system 0:14.41elapsed 74%CPU A couple of problems, though: mdig sometimes goes into a spin if I send it a SIGINT and/or fails with the seemingly spurious error "dns_name_fromtext failed with label too long". Dunno why - signal handling in BIND is still mysterious to me :-) It slurps in the entire query list before starting to talk to the DNS server, which can use a lot of RAM. (about 1kb per query) I suspect there may be a better way to do this concurrency limit stuff, making more effective use of the dispatch quota - at the moment the code is enforcing the concurrency limit in two places, in mdig and in the dns library. If I give mdig more than a few hundred thousand queries it seems to hang at the end of the job. On further investigation, there is something accidentally quadratic in the cleanup code! 100k 0:11.45 200k 0:24.01 300k 0:40.44 400k 1:01.51 500k 1:27.97 If I add an _exit() after isc_app_run() returns, before the query list is free()d, I get: 100k 0:10.23 200k 0:15.87 300k 0:20.08 400k 0:26.08 500k 0:29.89 1M 0:59.96 (adns can resolve this list in about 46s) If I _exit() just after the query list is free()d, the times are similar to the first list. Looks like something slooow in the memory manager. --- bin/tools/mdig.c | 44 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/bin/tools/mdig.c b/bin/tools/mdig.c index eaf477c..034bf91 100644 --- a/bin/tools/mdig.c +++ b/bin/tools/mdig.c @@ -51,6 +51,12 @@ #include <bind9/getaddresses.h> +#if 0 +#define TRACE(...) printf(__VA_ARGS__) +#else +#define TRACE(...) +#endif + #define CHECK(str, x) { \ if ((x) != ISC_R_SUCCESS) { \ fprintf(stderr, "mdig: %s failed with %s\n", \ @@ -110,6 +116,7 @@ static in_port_t port = 53; static isc_dscp_t dscp = -1; static unsigned char cookie_secret[33]; static int onfly = 0; +static int concurrency = 100; static char hexcookie[81]; struct query { @@ -140,6 +147,9 @@ struct query { }; static struct query default_query; static ISC_LIST(struct query) queries; +static struct query *pending; + +static void sendqueries(isc_task_t *task, isc_event_t *event); #define EDNSOPTS 100U /*% opcode text */ @@ -477,8 +487,18 @@ cleanup: dns_request_destroy(&reqev->request); isc_event_free(&event); - if (--onfly == 0) + --onfly; + if (pending != NULL && (onfly < concurrency * 9/10 || onfly < 10)) { + /* XXX is this the right event type? copied from app.c ... */ + event = isc_event_allocate(mctx, task, ISC_APPEVENT_SHUTDOWN, + sendqueries, pending, + sizeof(*event)); + if (event == NULL) + CHECK("isc_event_allocate", ISC_R_NOMEMORY); + isc_task_send(task, &event); + } else if (onfly == 0) isc_app_shutdown(); + return; } @@ -695,19 +715,23 @@ sendquery(struct query *query, isc_task_t *task) static void sendqueries(isc_task_t *task, isc_event_t *event) { - struct query *query = (struct query *)event->ev_arg; + struct query *query = pending; isc_event_free(&event); - while (query != NULL) { + while (query != NULL && onfly < concurrency) { struct query *next = ISC_LIST_NEXT(query, link); sendquery(query, task); query = next; } + TRACE("onfly = %d pending = %d\n", onfly, query != NULL); - if (onfly == 0) + if (query == NULL && onfly == 0) isc_app_shutdown(); + else + pending = query; + return; } @@ -740,6 +764,7 @@ help(void) { " global opt is one of:\n" " -4 (use IPv4 query transport only)\n" " -6 (use IPv6 query transport only)\n" +" -C count (concurrency limit, default 100)\n" " -b address[#port] (bind to source address/port)\n" " -p port (specify port number)\n" " +[no]dscp[=###] (Set the DSCP value to ### [0..63])\n" @@ -1624,6 +1649,11 @@ dash_option(const char *option, char *next, struct query *query, *hash = '#'; have_src = ISC_TRUE; return (value_from_next); + case 'C': + concurrency = atoi(value); + if (concurrency < 1) + fatal("concurrency must be greater than zero"); + return (value_from_next); case 'c': tr.base = value; tr.length = strlen(value); @@ -1931,7 +1961,7 @@ isc_mem_debugging = ISC_MEM_DEBUGRECORD; dispatchvx = NULL; RUNCHECK(dns_dispatch_getudp(dispatchmgr, socketmgr, taskmgr, have_src ? &srcaddr : &bind_any, - 4096, 100, 100, 17, 19, + 4096, concurrency, concurrency, 17, 19, attrs, attrmask, &dispatchvx)); requestmgr = NULL; RUNCHECK(dns_requestmgr_create(mctx, timermgr, socketmgr, @@ -1943,8 +1973,8 @@ isc_mem_debugging = ISC_MEM_DEBUGRECORD; view = NULL; RUNCHECK(dns_view_create(mctx, 0, "_test", &view)); - query = ISC_LIST_HEAD(queries); - RUNCHECK(isc_app_onrun(mctx, task, sendqueries, query)); + pending = ISC_LIST_HEAD(queries); + RUNCHECK(isc_app_onrun(mctx, task, sendqueries, pending)); (void)isc_app_run(); -- 2.10.1.445.g3cdd5d1 _______________________________________________ bind-workers mailing list bind-workers@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-workers

Subject: ISC_MEM_TRACKLINES is accidentally quadratic, was Re: support for large batch jobs with mdig
Date: Mon, 23 Jan 2017 19:34:13 +0000
To: bind-workers@isc.org
From: "Tony Finch" <dot@dotat.at>
Tony Finch <dot@dotat.at> wrote: > > If I give mdig more than a few hundred thousand queries it seems to hang > at the end of the job. On further investigation, there is something > accidentally quadratic in the cleanup code! This accidentally quadratic performance is to do with ISC_MEM_TRACKLINES - turn that off and freeing is instant! But it needs a fix... --- lib/dns/openssl_link.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/dns/openssl_link.c b/lib/dns/openssl_link.c index 29facf3..4e0976e 100644 --- a/lib/dns/openssl_link.c +++ b/lib/dns/openssl_link.c @@ -122,10 +122,17 @@ id_callback(void) { #endif #if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER) -#define FLARG_PASS , __FILE__, __LINE__ + #define FLARG #define FILELINE +#if ISC_MEM_TRACKLINES +#define FLARG_PASS , __FILE__, __LINE__ #else +#define FLARG_PASS +#endif + +#else + #define FLARG , const char *file, int line #define FILELINE , __FILE__, __LINE__ #if ISC_MEM_TRACKLINES -- 2.10.1.445.g3cdd5d1 _______________________________________________ bind-workers mailing list bind-workers@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-workers

CC: bind-workers@isc.org
Subject: Re: ISC_MEM_TRACKLINES is accidentally quadratic, was Re: support for large batch jobs with mdig
To: "Tony Finch" <dot@dotat.at>
Date: Tue, 24 Jan 2017 06:55:16 +1100
From: "Mark Andrews" <marka@isc.org>
In message <alpine.DEB.2.11.1701231932080.23062@grey.csi.cam.ac.uk>, Tony Finch writes: > Tony Finch <dot@dotat.at> wrote: > > > > If I give mdig more than a few hundred thousand queries it seems to hang > > at the end of the job. On further investigation, there is something > > accidentally quadratic in the cleanup code! Remove "isc_mem_debugging = ISC_MEM_DEBUGRECORD;" from the start of main in mdig.c. It got left in from developement. The flag is very good at reporting what memory is leaking but it isn't the fastest code. mdig needs dig's preparse args to set isc_mem_debugging. Mark > This accidentally quadratic performance is to do with ISC_MEM_TRACKLINES - > turn that off and freeing is instant! But it needs a fix... > > --- > lib/dns/openssl_link.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/lib/dns/openssl_link.c b/lib/dns/openssl_link.c > index 29facf3..4e0976e 100644 > --- a/lib/dns/openssl_link.c > +++ b/lib/dns/openssl_link.c > @@ -122,10 +122,17 @@ id_callback(void) { > #endif > > #if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER) > -#define FLARG_PASS , __FILE__, __LINE__ > + > #define FLARG > #define FILELINE > +#if ISC_MEM_TRACKLINES > +#define FLARG_PASS , __FILE__, __LINE__ > #else > +#define FLARG_PASS > +#endif > + > +#else > + > #define FLARG , const char *file, int line > #define FILELINE , __FILE__, __LINE__ > #if ISC_MEM_TRACKLINES > -- > 2.10.1.445.g3cdd5d1 > > _______________________________________________ > bind-workers mailing list > bind-workers@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-workers -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

CC: bind-workers@isc.org
Subject: Re: ISC_MEM_TRACKLINES is accidentally quadratic, was Re: support for large batch jobs with mdig
Date: Tue, 24 Jan 2017 06:55:16 +1100
To: "Tony Finch" <dot@dotat.at>
From: "Mark Andrews" <marka@isc.org>
In message <alpine.DEB.2.11.1701231932080.23062@grey.csi.cam.ac.uk>, Tony Finch writes: > Tony Finch <dot@dotat.at> wrote: > > > > If I give mdig more than a few hundred thousand queries it seems to hang > > at the end of the job. On further investigation, there is something > > accidentally quadratic in the cleanup code! Remove "isc_mem_debugging = ISC_MEM_DEBUGRECORD;" from the start of main in mdig.c. It got left in from developement. The flag is very good at reporting what memory is leaking but it isn't the fastest code. mdig needs dig's preparse args to set isc_mem_debugging. Mark > This accidentally quadratic performance is to do with ISC_MEM_TRACKLINES - > turn that off and freeing is instant! But it needs a fix... > > --- > lib/dns/openssl_link.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/lib/dns/openssl_link.c b/lib/dns/openssl_link.c > index 29facf3..4e0976e 100644 > --- a/lib/dns/openssl_link.c > +++ b/lib/dns/openssl_link.c > @@ -122,10 +122,17 @@ id_callback(void) { > #endif > > #if OPENSSL_VERSION_NUMBER < 0x10100000L || defined(LIBRESSL_VERSION_NUMBER) > -#define FLARG_PASS , __FILE__, __LINE__ > + > #define FLARG > #define FILELINE > +#if ISC_MEM_TRACKLINES > +#define FLARG_PASS , __FILE__, __LINE__ > #else > +#define FLARG_PASS > +#endif > + > +#else > + > #define FLARG , const char *file, int line > #define FILELINE , __FILE__, __LINE__ > #if ISC_MEM_TRACKLINES > -- > 2.10.1.445.g3cdd5d1 > > _______________________________________________ > bind-workers mailing list > bind-workers@lists.isc.org > https://lists.isc.org/mailman/listinfo/bind-workers -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org _______________________________________________ bind-workers mailing list bind-workers@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-workers

Subject: Re: ISC_MEM_TRACKLINES is accidentally quadratic, was Re: support for large batch jobs with mdig
Date: Mon, 23 Jan 2017 19:57:44 +0000
To: bind-workers@isc.org
From: "Tony Finch" <dot@dotat.at>
Tony Finch <dot@dotat.at> wrote: > > This accidentally quadratic performance is to do with ISC_MEM_TRACKLINES - > turn that off and freeing is instant! But it needs a fix... The other fix is to remove what looks (from the weird indentation) like some stray debugging code that got committed by accident. And with mdig unhobbled, it manages to process the whole top-1m list in less than 34s, nearly 30kqps. Not bad! --- bin/tools/mdig.c | 1 - 1 file changed, 1 deletion(-) diff --git a/bin/tools/mdig.c b/bin/tools/mdig.c index 034bf91..0839845 100644 --- a/bin/tools/mdig.c +++ b/bin/tools/mdig.c @@ -1901,7 +1901,6 @@ main(int argc, char *argv[]) { fatal("could not find either IPv4 or IPv6"); mctx = NULL; -isc_mem_debugging = ISC_MEM_DEBUGRECORD; RUNCHECK(isc_mem_create(0, 0, &mctx)); lctx = NULL; -- 2.10.1.445.g3cdd5d1 _______________________________________________ bind-workers mailing list bind-workers@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-workers

CC: bind-workers@isc.org
Subject: Re: ISC_MEM_TRACKLINES is accidentally quadratic, was Re: support for large batch jobs with mdig
Date: Mon, 23 Jan 2017 20:02:06 +0000
To: "Mark Andrews" <marka@isc.org>
From: "Tony Finch" <dot@dotat.at>
Mark Andrews <marka@isc.org> wrote: > > Remove "isc_mem_debugging = ISC_MEM_DEBUGRECORD;" from the start > of main in mdig.c. Thanks :-) I worked it out eventually! Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ - I xn--zr8h punycode Lundy, Fastnet, Irish Sea: South or southwest 4 or 5, increasing 6 at times. Slight or moderate, becoming rough except in Irish Sea, occasionally very rough in Fastnet. Occasional rain. Good, occasionally moderate.