Subject: | Race condition when accessing rbt |
Date: | Mon, 20 Oct 2014 14:07:15 +0200 |
To: | bind9-bugs@isc.org |
From: | Tomas Hozza <thozza@redhat.com> |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello.
We found a race condition when accessing RBT. The issue was found in the
version 9.8.3 we ship in RHEL-6, but I believe it is not fixed in the
latest version. At least from what I can see in the git.
It seems the race is between dns_rbt_findnode() and dns_rbt_deletetreeflat()
functions.
The output from coredump:
Program terminated with signal 7, Bus error.
#0 dns_rbt_findnode (rbt=0x7ff11bb97fd0, name=0x7ff11863f740,
foundname=0x7ff118660d30, node=0x7ff11f09cfc8, chain=0x7ff11f09cfe8, options=1,
callback=0x7ff1230ef090 <cache_zonecut_callback>,
callback_arg=0x7ff11f09cfd0) at rbt.c:806
806 if (hash != HASHVAL(hnode))
(gdb) bt
#0 dns_rbt_findnode (rbt=0x7ff11bb97fd0, name=0x7ff11863f740,
foundname=0x7ff118660d30, node=0x7ff11f09cfc8, chain=0x7ff11f09cfe8, options=1,
callback=0x7ff1230ef090 <cache_zonecut_callback>,
callback_arg=0x7ff11f09cfd0) at rbt.c:806
#1 0x00007ff1230f97a2 in cache_find (db=<value optimized out>,
name=0x7ff11863f740, version=<value optimized out>, type=12, options=0,
now=1405818422, nodep=0x7ff11f09e318, foundname=0x7ff118660d30,
rdataset=0x7ff11866b780, sigrdataset=0x0) at rbtdb.c:4808
#2 0x00007ff12386d867 in query_find (client=<value optimized out>, event=0x0,
qtype=12) at query.c:5476
#3 0x00007ff1238745dc in ns_query_start (client=0x7ff104497310) at query.c:7345
#4 0x00007ff123859b16 in client_request (task=<value optimized out>,
event=<value optimized out>) at client.c:1961
#5 0x00007ff1221ec2f8 in dispatch (uap=0x7ff1237e7010) at task.c:1012
#6 run (uap=0x7ff1237e7010) at task.c:1157
#7 0x00007ff121ba19d1 in ?? ()
#8 0x00007ff11f0a0700 in ?? ()
#9 0x0000000000000000 in ?? ()
(gdb) p hnode
$1 = (dns_rbtnode_t *) 0xdededededededede
(gdb) p rbt
$3 = (dns_rbt_t *) 0x7ff11bb97fd0
(gdb) t 8
[Switching to thread 8 (Thread 28297)]#0 0x00007ff121ba33d1 in ?? ()
(gdb) bt
#0 0x00007ff121ba33d1 in ?? ()
#1 0x00007ff1221dd489 in isc___mem_put (ctx0=0x7ff1140d5450,
ptr=0x7ff10822c238, size=84, file=0x7ff1231cb3e8 "rbt.c", line=2104) at mem.c:1335
#2 0x00007ff1230ea17b in dns_rbt_deletetreeflat (rbtp=0x7ff11bb93288,
quantum=<value optimized out>) at rbt.c:2104
#3 dns_rbt_destroy2 (rbtp=0x7ff11bb93288, quantum=<value optimized out>) at
rbt.c:290
#4 0x00007ff1230fc36d in free_rbtdb (rbtdb=0x7ff11bb93010,
log=isc_boolean_true, event=0x0) at rbtdb.c:900
#5 0x00007ff1230fcf52 in maybe_free_rbtdb (rbtdb=0x7ff11bb93010) at rbtdb.c:1036
#6 0x00007ff1230fd210 in detach (dbp=0x7ff0fc85b668) at rbtdb.c:1051
#7 0x00007ff1230ab1c0 in dns_db_detach (dbp=0x7ff0fc85b668) at db.c:182
#8 0x00007ff12317d2f9 in dns_view_flushcache2 (view=0x7ff0fc85b610,
fixuponly=<value optimized out>) at view.c:1513
#9 0x00007ff11b8fd57d in ?? ()
#10 0x00007ff11e69ebd0 in ?? ()
#11 0x00007ff100000000 in ?? ()
#12 0x00007ff123404a60 in ?? () from /usr/lib64/libdns.so.81.4.1
#13 0x00000001fc055678 in ?? ()
#14 0x00007ff11b907898 in ?? ()
#15 0x0000000100000017 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) up
#1 0x00007ff1221dd489 in isc___mem_put (ctx0=0x7ff1140d5450,
ptr=0x7ff10822c238, size=84, file=0x7ff1231cb3e8 "rbt.c", line=2104) at mem.c:1335
1335 MCTXLOCK(ctx, &ctx->lock);
(gdb)
#2 0x00007ff1230ea17b in dns_rbt_deletetreeflat (rbtp=0x7ff11bb93288,
quantum=<value optimized out>) at rbt.c:2104
2104 isc_mem_put(rbt->mctx, node, NODE_SIZE(node));
(gdb) p rbt
$4 = (dns_rbt_t *) 0x7ff11bb97fd0
(gdb) p node
$5 = (dns_rbtnode_t *) 0x7ff10822c238
The function find_cache() calls
RWLOCK(&search.rbtdb->tree_lock, isc_rwlocktype_read);
and then calls the dns_rbt_findnode(search.rbtdb->tree, ...);
In the function dns_rbt_findnode() nodes are iterated over
using the hashtable in rbt. No rbt_node locks are acquired
what so ever.
In another thread dns_view_flushcache() is invoked which
in the end results into call to dns_rbt_destroy2() ->
dns_rbt_deletetreeflat(), where the whole tree is feed
apparently without any lock.
I think proper fix would be either to lock the tree when
freeing it using dns_rbt_deletetreeflat() or checking
if the cache DB is still valid (maybe increase reference
so it does not get freed) in cache_find().
Since it is impossible to reproduce this I would like to
also ask you for your opinion on this.
I can provide further information from the coredump if
you think there might be something interesting.
Thank you in advance!
Regards,
- --
Tomas Hozza
Software Engineer - EMEA ENG Developer Experience
PGP: 1D9F3C2D
Red Hat Inc. http://cz.redhat.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJURPrqAAoJEMWIetUdnzwty6MIAJjQ03CcB1CuAihFfU1N4y0R
kFPOxH2Q8aPShvgu0buLd+/4wVCUJp5gXVTWrjdd7rCe5gAqzpdpr+uYxoCqnauF
JTleZ8Mq4Mf/laCi9TUdQ4sENbM6yTNiRFu7KytDnc2FWD1DOP7AXwow5W886dtq
vLAo4nnd7y4k44nSK0UtrSGM9RFCaIiKxUa9od2m1pim+06Ps3XnOuulI1MgQaGB
DVUigyDqFHdlR9RhFjR9D5Ga+aW8AKk9sjxs+9wnnCdqs9v9NJdqrHpfTImZVdKZ
weOSZiwZa08zRNODTdhNeDVSQ2upK0Y5WIQ2oDv5lwWAjJbEHgvk8ylHugtDW98=
=Pqy2
-----END PGP SIGNATURE-----