Report information
The Basics
Id:
41815
Status:
resolved
Estimated:
24 hours (1,440 minutes)
Worked:
19.83 hours (1,190 minutes)
Users:
tmark: 19.83 hours (1,190 minutes)
Left:
24 hours (1,440 minutes)
Priority:
Low/Low
Queue:

People
Requestors:
Cc:
AdminCc:

BugTracker
Version Fixed:
4.3.5 4.1-ESV-R14
Version Found:
4.3.3
Versions Affected:
all
Versions Planned:
4.3.5
Priority:
P2 Normal
Severity:
S2 Normal
CVSS Score:
(no value)
CVE ID:
(no value)
Component:
DHCP Server
Area:
bug

Attachments
41815a.patch dhcp-dev1.pcap dhcp-dev1_dhcpd.leases nofailover.txt rt41815.patch Show all

Dates
Created:Fri, 26 Feb 2016 19:57:18 -0500
Updated:Wed, 21 Feb 2018 17:10:12 -0500
Closed:Thu, 25 Aug 2016 14:26:04 -0400



This bug tracker is no longer active.

Please go to our Gitlab to submit issues (both feature requests and bug reports) for active projects maintained by Internet Systems Consortium (ISC).

Due to security and confidentiality requirements, full access is limited to the primary maintainers.

Subject: issues reclaiming abandoned leases
Date: Fri, 26 Feb 2016 18:57:09 -0600
To: dhcp-bugs@isc.org
From: "David Zych" <dmrz@illinois.edu>
In the course of some proof-of-concept testing, I've encountered some quite odd behavior in dhcp-4.3.3-P1 with regard to reclaiming abandoned leases. I have two DHCP servers running dhcp-4.3.3-P1 compiled with --enable-log-pid --enable-delayed-ack --enable-binary-leases --enable-use-sockets For purposes of this particular test, they are running a very simple failover configuration for a single pool of only two addresses (172.21.195.13 and 172.21.195.14), with ping-check true and ping-timeout 1 (full configs are attached). The router for the client subnet has ip helpers pointing to both DHCP servers. To test abandoned lease behavior, I have statically configured "squatter" VMs to occupy both IPs in the range (and to continually ping an off-network target so they won't fall out of the router's ARP table). First, I stop both servers, truncate their lease files, start them again, and wait for them to attain normal state: ... 2016-02-26T14:57:23.259278-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: I move from recover-done to normal 2016-02-26T14:57:23.259284-06:00 dhcp-dev1 dhcpd[39761]: balancing pool d7d170 172.21.195.0/28 total 2 free 2 backup 0 lts 1 max-own (+/-)0 2016-02-26T14:57:23.259286-06:00 dhcp-dev1 dhcpd[39761]: balanced pool d7d170 172.21.195.0/28 total 2 free 1 backup 1 lts 0 max-misbal 0 2016-02-26T14:57:23.262960-06:00 dhcp-dev1 dhcpd[39761]: Sending updates to DHCP-W. 2016-02-26T14:57:23.267516-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: peer moves from recover-done to normal 2016-02-26T14:57:23.267529-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: Both servers normal Since the pool contains two leases, they balance with 1 free and 1 backup as expected. Now, while running a packet capture on the primary, I start a client VM (in my case it happens to be Peppermint 6) which will attempt to obtain a lease using DHCP. 2016-02-26T14:59:58.529451-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T14:59:58.529992-06:00 dhcp-dev1 dhcpd[39761]: Abandoning IP address 172.21.195.14: pinged before offer 2016-02-26T14:59:58.529984-06:00 dhcp-dev1 dhcpd[39761]: ICMP Echo reply while lease 172.21.195.14 valid. 2016-02-26T14:59:59.537926-06:00 dhcp-dev1 dhcpd[39761]: balancing pool d7d170 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-own (+/-)0 2016-02-26T14:59:59.537943-06:00 dhcp-dev1 dhcpd[39761]: balanced pool d7d170 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-misbal 0 At 14:59:58, on the primary, we receive the client's first DHCPDISCOVER, but we ping our free lease 172.21.195.14 and get a reply, so we mark this lease abandoned and we do not send a DHCPOFFER (so far so good). 2016-02-26T15:00:01.568022-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.14. 2016-02-26T15:00:01.568039-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:01.568041-06:00 dhcp-dev1 dhcpd[39761]: DHCPOFFER on 172.21.195.14 to 08:00:27:19:62:e0 (p6test3) via 172.21.195.1 At 15:00:01, we attempt to reclaim abandoned lease 172.21.195.14 in order to answer the client's second DHCPDISCOVER, but we don't send another ping; instead we respond straightaway with a DHCPOFFER for the (still in fact unusable) IP. ** This is a bug AFAICT. We should ping-check the IP, find that it's still in use, and continue to not send a DHCPOFFER. However, apparently on some level we do remain aware that there's a problem with this lease, because when the client DHCPREQUESTs it (in response to our DHCPOFFER), we respond with DHCPNAK: 2016-02-26T15:00:01.574843-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-02-26T15:00:01.574858-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:04.167068-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-02-26T15:00:04.167089-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:10.628289-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-02-26T15:00:10.628313-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1 Meanwhile, the secondary peer (which ignored the first DHCPDISCOVER due to load balancing) tries to answer the second DHCPDISCOVER using the other [backup state] lease... 2016-02-26T14:59:58.528830-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: load balance to peer DHCP-W 2016-02-26T15:00:00.540043-06:00 dhcp-dev2 dhcpd[30781]: balancing pool 2843050 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-own (+/-)0 2016-02-26T15:00:00.540068-06:00 dhcp-dev2 dhcpd[30781]: balanced pool 2843050 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-misbal 0 2016-02-26T15:00:01.567524-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:01.567508-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: cancel load balance to peer DHCP-W - no free leases 2016-02-26T15:00:01.567919-06:00 dhcp-dev2 dhcpd[30781]: Abandoning IP address 172.21.195.13: pinged before offer 2016-02-26T15:00:01.567912-06:00 dhcp-dev2 dhcpd[30781]: ICMP Echo reply while lease 172.21.195.13 valid. but gets a ping reply and (correctly) marks this lease abandoned. At this point, our client backs off for about 13 seconds. When the client later tries another DHCPDISCOVER, the primary now attempts to reclaim 172.21.195.13 (previously marked abandoned by the secondary): 2016-02-26T15:00:23.948174-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.13. 2016-02-26T15:00:23.948192-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:23.948637-06:00 dhcp-dev1 dhcpd[39761]: Abandoning IP address 172.21.195.13: pinged before offer 2016-02-26T15:00:23.948630-06:00 dhcp-dev1 dhcpd[39761]: ICMP Echo reply while lease 172.21.195.13 valid. This time the primary behaves as it should; after reclaiming 172.21.195.13, we ping-check it first and (upon receiving a reply) immediately re-abandon the lease without ever sending a DHCPOFFER. A few seconds later, however, we get a fourth DHCPDISCOVER, and the primary now repeats its wrong behavior from the second DHCPDISCOVER: it reclaims the lease it had just abandoned, immediately DHCPOFFERs it without ping-checking first, but then DHCPNAKs the client's attempt to actually DHCPREQUEST it. 2016-02-26T15:00:27.309284-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.13. 2016-02-26T15:00:27.309301-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:27.309303-06:00 dhcp-dev1 dhcpd[39761]: DHCPOFFER on 172.21.195.13 to 08:00:27:19:62:e0 (p6test3) via 172.21.195.1 2016-02-26T15:00:27.376495-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable. 2016-02-26T15:00:27.376517-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:30.612954-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable. 2016-02-26T15:00:30.612976-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1 2016-02-26T15:00:36.302706-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable. 2016-02-26T15:00:36.302742-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1 Finally, after another brief interval, the client sends a fifth DHCPDISCOVER. This time we don't attempt to reclaim an abandoned lease at all: 2016-02-26T15:00:48.743496-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: network 172.21.195.0/28: no free leases The final version of dhcpd.leases from the primary, also attached, shows both leases being abandoned for a time, but then ending up in free and backup states. ** I can't say with confidence that this is a bug, but it's quite confusing to me; AFAICT since the squatters are still there and responding to pings, the leases should end up as abandoned. (note: some additional trials while watching `tail -f dhcpd.leases` give the subjective impression that dhcpd is freeing the leases spontaneously after a short time has passed, i.e. *not* as part of a normal reclaim attempt in response to a received DHCPDISCOVER, which makes me even more confused.) Also attached is the traffic capture from the primary, filtered with ip.addr==128.174.204.220 && (icmp || udp.port==67 || tcp.port==647), which clearly shows pings being sent after the first and third DHCPDISCOVERs but not after the other ones. As a variation on the theme, I realized *after* writing most of this up (so I didn't want to start over) that the strange behavior is not actually unique to failover operation. Stopping dhcp-dev2 entirely, commenting out all the failover-related bits of dhcp-dev1's dhcpd.conf, and restarting dhcp-dev1 again with an empty leases file, I was able to observe the same general pattern of previously abandoned leases being inappropriately DHCPOFFERed without a fresh ping-check, and spontaneously being freed. Best/worst of all, it appears that on one occasion the spontaneous-free occurred *in between* the inappropriate DHCPOFFER and the subsequent DHCPREQUEST, with the result that instead of a NAK the server ACKed the lease to the client (thereby precipitating a real live IP conflict!). My travelogue for this test run is the attached nofailover.txt. Second variation: single DHCP server, no failover, with only one "squatter" VM (the other IP is legitimately available). First DISCOVER marks the first IP abandoned due to ping-check, second DISCOVER successfully and legitimately acquires a lease on the second IP, no reclaim attempts are necessary because no other DISCOVERs arrive -- but the first IP is nonetheless mysteriously updated to free in the dhcpd.leases file about 2 minutes later. Thanks, David P.S. Prior to the nofailover run, I had also removed the *-lease-time 600 statements within the subnet, which is why the eventual active lease was issued for a full day duration instead of 10 minutes. This is not at all important, except to disarm a potential red herring. P.P.S. With the possible exception of the apparent race condition on the nofailover run, I've found all of this very easy to reproduce in my environment; hopefully you will as well, but let me know if there's anything further I can do to help. -- David Zych Lead Network Service Engineer University of Illinois Technology Services

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body not shown because it is not plain text.

Message body is not shown because sender requested not to inline it.

Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Mon, 29 Feb 2016 16:37:26 -0600
To: dhcp-bugs@isc.org
From: "David Zych" <dmrz@illinois.edu>
Two additional notes on these issues: 1. I recompiled dhcp-4.3.3-P1 without the experimental options I was using before (now specifying only --prefix, --sysconfdir, and --localstatedir) and was still able to reproduce the bad behaviors. My simplest test case is now a single server running default dhcp-4.3.3-P1 with the following dhcpd.conf: === subnet 128.174.204.0 netmask 255.255.254.0 { not authoritative; } lease-file-name "/services/ts-dhcp/var/lib/dhcpd/dhcpd.leases"; log-facility local7; ping-check true; ping-timeout 1; subnet 172.21.195.0 netmask 255.255.255.240 { option routers 172.21.195.1; option broadcast-address 172.21.195.15; pool { deny dynamic bootp clients; range 172.21.195.13 172.21.195.14; } } === plus two "squatter" VMs occupying both IP addresses, and a single Peppermint 6 client VM attempting to obtain a lease via DHCP (booting from a snapshot with no prior state on this network). Starting from an empty dhcpd.leases, and letting the client run for a few minutes while watching `tail -f dhcpd.leases /var/log/syslog/local7`, I can easily observe all of the following: * inappropriate DHCPOFFER (without ping-checking the address again) when reclaiming an abandoned IP * eventually (after enough tries) a DHCPACK on one of these IPs, leading to an IP conflict with the squatter * abandoned leases mysteriously being set back to free in dhcpd.leases after roughly two minutes ** Having slept on it, this third problem (mysterious freeing of abandoned leases) is actually the one that makes me most nervous, because it can cause trouble even when the pool is not exhausted. 2. Abandoned leases are also mysteriously cleared/discarded upon server startup. Using same dhcpd.conf as above, start from empty dhcpd.leases file, but this time stop as soon as you see the first DHCPDISCOVER appear in the logs. ts-dhcp@dhcp-dev1:~/var/lib/dhcpd$ tail -f dhcpd.leases /var/log/syslog/local7 ==> dhcpd.leases <== # The format of this file is documented in the dhcpd.leases(5) manual page. # This lease file was written by isc-dhcp-4.3.3-P1 server-duid "\000\001\000\001\036g~7\000PV\214Pg"; ==> /var/log/syslog/local7 <== 2016-02-29T16:09:41.343441-06:00 dhcp-dev1 dhcpd: DHCPDISCOVER from 00:10:18:72:58:e9 via eth0: network 128.174.204.0/23: no free leases 2016-02-29T16:09:59.945197-06:00 dhcp-dev1 dhcpd: Internet Systems Consortium DHCP Server 4.3.3-P1 2016-02-29T16:09:59.945221-06:00 dhcp-dev1 dhcpd: Copyright 2004-2016 Internet Systems Consortium. 2016-02-29T16:09:59.945239-06:00 dhcp-dev1 dhcpd: All rights reserved. 2016-02-29T16:09:59.945263-06:00 dhcp-dev1 dhcpd: For info, please visit https://www.isc.org/software/dhcp/ 2016-02-29T16:09:59.945512-06:00 dhcp-dev1 dhcpd: Wrote 0 leases to leases file. 2016-02-29T16:09:59.947758-06:00 dhcp-dev1 dhcpd: Listening on LPF/eth0/00:50:56:8c:50:67/128.174.204.0/23 2016-02-29T16:09:59.947875-06:00 dhcp-dev1 dhcpd: Sending on LPF/eth0/00:50:56:8c:50:67/128.174.204.0/23 2016-02-29T16:09:59.947922-06:00 dhcp-dev1 dhcpd: Sending on Socket/fallback/fallback-net 2016-02-29T16:09:59.949151-06:00 dhcp-dev1 dhcpd: Server starting service. 2016-02-29T16:12:45.070878-06:00 dhcp-dev1 dhcpd: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1 ==> dhcpd.leases <== lease 172.21.195.13 { starts 1 2016/02/29 22:12:45; ends 1 2016/02/29 22:12:45; cltt 1 2016/02/29 22:12:45; binding state abandoned; next binding state free; rewind binding state free; client-hostname "p6test3"; } ==> /var/log/syslog/local7 <== 2016-02-29T16:12:45.071433-06:00 dhcp-dev1 dhcpd: Abandoning IP address 172.21.195.13: pinged before offer 2016-02-29T16:12:45.071417-06:00 dhcp-dev1 dhcpd: ICMP Echo reply while lease 172.21.195.13 valid. At this point we halt the client VM so there will be no further DHCPDISCOVERs. Now stop dhcpd (with kill -TERM) and observe that (as expected) it leaves behind a dhcpd.leases containing exactly one abandoned lease: lease 172.21.195.13 { starts 1 2016/02/29 22:12:45; ends 1 2016/02/29 22:12:45; cltt 1 2016/02/29 22:12:45; binding state abandoned; next binding state free; rewind binding state free; client-hostname "p6test3"; } Finally, start dhcpd again. ts-dhcp@dhcp-dev1:~/var/lib/dhcpd$ ~/bin/start_dhcp Internet Systems Consortium DHCP Server 4.3.3-P1 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Config file: /services/ts-dhcp/etc/dhcpd.conf Database file: /services/ts-dhcp/var/lib/dhcpd/dhcpd.leases PID file: /services/ts-dhcp/var/run/dhcpd.pid Wrote 0 leases to leases file. Listening on LPF/eth0/00:50:56:8c:50:67/128.174.204.0/23 Sending on LPF/eth0/00:50:56:8c:50:67/128.174.204.0/23 Sending on Socket/fallback/fallback-net And indeed "Wrote 0 leases" is what we get; the new dhcpd.leases no longer contains any leases. Furthermore, the no-longer-active file dhcpd.leases~ now contains a second entry freeing the formerly-abandoned lease: lease 172.21.195.13 { starts 1 2016/02/29 22:12:45; ends 1 2016/02/29 22:12:45; cltt 1 2016/02/29 22:12:45; binding state abandoned; next binding state free; rewind binding state free; client-hostname "p6test3"; } lease 172.21.195.13 { starts 1 2016/02/29 22:12:45; ends 1 2016/02/29 22:12:45; tstp 1 2016/02/29 22:12:45; cltt 1 2016/02/29 22:12:45; binding state free; client-hostname "p6test3"; } Thanks, David -- David Zych Lead Network Service Engineer University of Illinois Technology Services
Hi David Thank you for your comprehensive report. I've reviewed it and been able to reproduce several of the issues. Due to the short time before the code freeze for the next release we don't plan to work on this for the upcoming code release (4.3.4). We do hope to address it in the release after that (4.3.5) From your mail: Starting from an empty dhcpd.leases, and letting the client run for a few minutes while watching `tail -f dhcpd.leases /var/log/syslog/local7`, I can easily observe all of the following: * inappropriate DHCPOFFER (without ping-checking the address again) when reclaiming an abandoned IP This is due to a check to see if the lease was recently used. I will need to dig into the history to see if there is an explanation as to why - my guess would be an attempt to limit the number of pings sent out due to a faulty client. While this will happen a lot in your and my tests it isn't as big an issue in a real network. Most likely the server won't be trying to constantly reuse an abandoned address and if the lease isn't used for about a minute the ping checks should be happening. * eventually (after enough tries) a DHCPACK on one of these IPs, leading to an IP conflict with the squatter I didn't see this, but didn't try to cause it, I have no reason to doubt it would be an outcome but see above - in a more normal network it is less likely. * abandoned leases mysteriously being set back to free in dhcpd.leases after roughly two minutes From my testing the abandoned leases will be set back to free more quickly than is good. The two minutes you are seeing is an artifact of the default time the server uses for a temporary timer when handing out an offer. ** Having slept on it, this third problem (mysterious freeing of abandoned leases) is actually the one that makes me most nervous, because it can cause trouble even when the pool is not exhausted. 2. Abandoned leases are also mysteriously cleared/discarded upon server startup. This is the same issue as the leases being freed. The server startup triggers the same cleanup as occurs based on the timer.
Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Tue, 1 Mar 2016 16:23:31 -0600
To: dhcp-bugs@isc.org
From: "David Zych" <dmrz@illinois.edu>
Hi Shawn, Thanks very much for the quick and informative response! I look forward to seeing fixes in 4.3.5. David P.S. Limiting the frequency of pings to the same target makes perfect sense to me; perhaps an easy solution would be to perform the same recently-used check before deciding to reclaim a particular abandoned lease in the first place (i.e. if the lease was marked abandoned so recently that we wouldn't be willing to ping it again yet, then we don't attempt to reclaim it at all).
On Tue Mar 01 22:23:36 2016, dmrz@illinois.edu wrote: > Hi Shawn, > > Thanks very much for the quick and informative response! I look forward > to seeing fixes in 4.3.5. > > David > > P.S. Limiting the frequency of pings to the same target makes perfect > sense to me; perhaps an easy solution would be to perform the same > recently-used check before deciding to reclaim a particular abandoned > lease in the first place (i.e. if the lease was marked abandoned so > recently that we wouldn't be willing to ping it again yet, then we don't > attempt to reclaim it at all). > We can consider that as an option. Part of the problem is the nature of the testing we both did. In the test there is only a very small number of leases and the are all abandoned. In this case the serve has two bad options - don't hand out a lease or try to reuse an abandoned lease - which is worse is somewhat dependent on the rest of your set up. In a more normal situation with a larger number of leases this occurrence is less likely.
Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Mon, 14 Mar 2016 18:21:46 -0500
To: dhcp-bugs@isc.org
From: "David Zych" <dmrz@illinois.edu>
On 03/14/2016 12:46 PM, Shawn Routhier via RT wrote: > Part of the problem is the nature > of the testing we both did. In the test there is only a very small number > of leases and the are all abandoned. I'm not convinced that the nature of the testing is a problem; why does the small number of leases make a difference? Imagine a pool of 1,000 addresses, of which 996 are currently active leases and the remaining 4 are occupied by squatters. Now a new client shows up and tries to obtain a lease. AFAICT the bug in question will cause at least one of those abandoned leases to be inappropriately DHCPOFFERed without a ping-check after several retries (I think probably on the new client's 5th DHCPDISCOVER attempt if using a single server, or sooner if using failover). Note: obviously the pool has to be full (counting active + abandoned) in order for this bug to manifest, but that follows trivially from the fact that it's a bug in the behavior of reclaiming abandoned leases, and reclaiming abandoned leases should* only occur when the pool is full. [* the other bug in this same ticket (abandoned leases inappropriately freed after 2 minutes) notwithstanding] > In this case the serve has two bad > options - don't hand out a lease or try to reuse an abandoned lease - which > is worse is somewhat dependent on the rest of your set up. In this case, the lease was marked abandoned less than a minute (or however long it is) ago, so we know the squatter was present at that time; the question is, is the squatter still present now? If so, then no correct behavior by the DHCP server can possibly result in handing out the lease. So the difference is, would we rather risk bombarding the squatter with many pings (if we receive many DHCPDISCOVERs), or would we rather risk maintaining an abandoned lease for up to 1 extra minute (?) longer than necessary after the squatter has in reality gone away? In light of the fact that ISC has in the past deemed it necessary to write ping-limiting code in the first place, I would tend to lean toward that option, but either option would be much better than the present behavior of "try to reuse an abandoned lease without ping-checking it first". Thanks, David
On Wed Jul 06 13:07:59 2016, tmark wrote: > > > > I believe we concluded that there would be problems with the ping > > code > > if the user set the time value to be < 60 seconds but we were okay > > with that > > as an unlikely set up. We should mention it in the documentation. > > Actually, since I changed the ping check logic to test to: > > (cltt > 60) || (state == FTS_ABANDONED) > > it really doesn't matter what value they use. > Doesn't the lease get moved back to the FREE state after the timeout period? So if they use something like 30 then 30 seconds after the last touch the lease goes from ABANDONED to FREE and if we still have the previous CLTT it would be less than 60. I took a quick look and didn't see CLTT being cleared but I may have missed it.
Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Thu, 7 Jul 2016 11:05:02 -0500
To: dhcp-review@isc.org
From: "David Zych" <dmrz@illinois.edu>
Hi Shawn, Just checking: was this intended for me? (It looks like the middle of an internal discussion) Thanks, David On 07/06/2016 11:06 PM, Shawn Routhier via RT wrote: > On Wed Jul 06 13:07:59 2016, tmark wrote: > >>> >>> I believe we concluded that there would be problems with the ping >>> code >>> if the user set the time value to be < 60 seconds but we were okay >>> with that >>> as an unlikely set up. We should mention it in the documentation. >> >> Actually, since I changed the ping check logic to test to: >> >> (cltt > 60) || (state == FTS_ABANDONED) >> >> it really doesn't matter what value they use. >> > > Doesn't the lease get moved back to the FREE state after the timeout period? > So if they use something like 30 then 30 seconds after the last touch the lease > goes from ABANDONED to FREE and if we still have the previous CLTT it would > be less than 60. I took a quick look and didn't see CLTT being cleared but I > may have missed it. > > > -- David Zych Lead Network Service Engineer University of Illinois Technology Services
On Thu Jul 07 16:05:12 2016, dmrz@illinois.edu wrote: > Hi Shawn, > > Just checking: was this intended for me? (It looks like the middle of > an internal discussion) No it wasn't intended for you. Yes it was an internal discussion though it's not especially private. We are in the process of addressing the issue and that was part of there review process. > > Thanks, > David
Hello David: You'll be pleased to hear that we've addressed this issue in our upcoming releases: 4.3.5 and 4.1-ESV-R14 due out this fall. We added a global parameter, "abandon-lease-time", which is used as the length of time an abandoned lease remains in the abandoned state. The default is 86400 seconds. The server will not attempt to reclaim an abandoned lease before then unless there are no free addresses. In which case, the server will conduct a ping check (provided they are enabled) prior to offering an abandoned lease. Note that the expiry period is preserved across system restarts so abandoned leases. Hopefully this addresses your issues. Attached to this email is a patch containing the changes if you wish to test them. We would greatly appreciate your feedback. Lastly, it is our custom to thank our contributors by citing them in the release notes. If you would like be recognized in this fashion please reply with how you would like to be identified. Most often, it is by name and/or organization. Thank you for bringing this matter to our attention and for the time and effort you have spent on it. We welcome any future submissions you care to make. Sincerely, Thomas Markwalder ISC Software Engineering
Subject: rt41815.patch

Message body not shown because it is not plain text.

Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Mon, 11 Jul 2016 16:06:23 -0500
To: dhcp-review@isc.org
From: "David Zych" <dmrz@illinois.edu>
Hi Thomas, I like your parameter solution very much, and I confirm that the patch successfully prevents the problem scenarios I originally reported -- thanks! However, there's a variant scenario whose behavior still seems wrong: Single DHCP server with a clean-slate lease file and the following config: === subnet 128.174.204.0 netmask 255.255.254.0 { not authoritative; } lease-file-name "/services/ts-dhcp/var/lib/dhcpd/dhcpd.leases"; log-facility local7; ping-check true; ping-timeout 1; subnet 172.21.195.0 netmask 255.255.255.240 { option routers 172.21.195.1; option broadcast-address 172.21.195.15; pool { deny dynamic bootp clients; range 172.21.195.13 172.21.195.14; } } === plus two "squatter" VMs occupying both IP addresses, and a single Peppermint 6 client VM attempting to obtain a lease via DHCP (booting from a snapshot with no prior state on this network)... Both IPs are correctly abandoned (and remain correctly abandoned) due to the ICMP replies, and no DHCPOFFER is made. Additionally, they remain correctly abandoned across a dhcpd restart. However, if we now remove the 172.21.195.14 squatter from the test network (which in principle should make it possible to *successfully* reclaim that lease and give it to the DHCP client), we instead observe: 2016-07-11T15:01:11.212293-05:00 dhcp-dev1 dhcpd.20160707[27959]: Reclaiming abandoned lease 172.21.195.13. 2016-07-11T15:01:11.212308-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPDISCOVER from 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:11.212713-05:00 dhcp-dev1 dhcpd.20160707[27959]: ICMP Echo reply while lease 172.21.195.13 valid. 2016-07-11T15:01:11.212719-05:00 dhcp-dev1 dhcpd.20160707[27959]: Abandoning IP address 172.21.195.13: pinged before offer 2016-07-11T15:01:24.774305-05:00 dhcp-dev1 dhcpd.20160707[27959]: Reclaiming abandoned lease 172.21.195.14. 2016-07-11T15:01:24.774330-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPDISCOVER from 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:25.775425-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPOFFER on 172.21.195.14 to 08:00:27:64:2f:26 (p6test3) via 172.21.195.1 2016-07-11T15:01:25.779128-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:25.779134-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:28.976453-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:28.976479-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:35.563721-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:35.563735-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:42.084304-05:00 dhcp-dev1 dhcpd.20160707[27959]: Reclaiming abandoned lease 172.21.195.14. 2016-07-11T15:01:42.084323-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPDISCOVER from 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:43.085405-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPOFFER on 172.21.195.14 to 08:00:27:64:2f:26 (p6test3) via 172.21.195.1 2016-07-11T15:01:43.088356-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:43.088368-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:46.849568-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:46.849593-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:01:50.422391-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:01:50.422414-05:00 dhcp-dev1 dhcpd.20160707[27959]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 ... Somehow dhcpd is of two different minds about the status of .14 -- it has correctly determined that it's now okay to OFFER the lease, but refuses to ACK the client's request for it. The other odd thing is that, if I repeat this experiment from the beginning with "abandon-lease-time 180;" added to the config, the abandoned lease 172.21.195.14 is not cleared three minutes after the last time a ping-check resulted in an ICMP reply (which is what I expected). Instead, this lease remains abandoned (and continues to behave in the OFFER-but-NAK mode described above) for as long as the DHCP client keeps trying to get it, and is cleared only after the client gives up and 2 more minutes elapse. * last failed ping-check while squatter was still active: 2016-07-11T15:13:27.410532-05:00 dhcp-dev1 dhcpd.20160707[28315]: ICMP Echo reply while lease 172.21.195.14 valid. 2016-07-11T15:13:27.410539-05:00 dhcp-dev1 dhcpd.20160707[28315]: Abandoning IP address 172.21.195.14: pinged before offer * resulting lease entry: lease 172.21.195.14 { starts 1 2016/07/11 20:13:27; ends 1 2016/07/11 20:16:27; cltt 1 2016/07/11 20:13:27; binding state abandoned; next binding state free; rewind binding state free; client-hostname "p6test3"; } * last OFFER-but-NAK occurrence: 2016-07-11T15:18:34.334342-05:00 dhcp-dev1 dhcpd.20160707[28315]: Reclaiming abandoned lease 172.21.195.14. 2016-07-11T15:18:34.334361-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPDISCOVER from 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:18:35.335426-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPOFFER on 172.21.195.14 to 08:00:27:64:2f:26 (p6test3) via 172.21.195.1 2016-07-11T15:18:35.338290-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:18:35.338296-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:18:39.054633-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:18:39.054654-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 2016-07-11T15:18:42.214211-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:64:2f:26 via 172.21.195.1: lease 172.21.195.14 unavailable. 2016-07-11T15:18:42.214230-05:00 dhcp-dev1 dhcpd.20160707[28315]: DHCPNAK on 172.21.195.14 to 08:00:27:64:2f:26 via 172.21.195.1 * resulting lease entry (written at UTC 20:20:34, not 20:18:34): lease 172.21.195.14 { starts 1 2016/07/11 20:18:34; ends 1 2016/07/11 20:20:34; cltt 1 2016/07/11 20:18:34; binding state free; hardware ethernet 08:00:27:64:2f:26; client-hostname "p6test3"; } After this, if the client returns again, it has no problem obtaining the now-free lease. So this remaining scenario should AFAICT only impact nets which become completely full. Thanks, David P.S. My test dhcpd (dhcp-4.3.4 with rt41815.patch) was compiled with --prefix=... --sysconfdir=... --localstatedir=... \ --enable-log-pid --enable-delayed-ack --enable-binary-leases --enable-use-sockets, just in case any of that matters (though I doubt it should). > Lastly, it is our custom to thank our contributors by citing them in the release notes. If you would like be recognized in this fashion please reply with how you would like to be identified. Most often, it is by name and/or organization. David Zych at University of Illinois Thanks! David -- David Zych Lead Network Service Engineer University of Illinois Technology Services
On Mon Jul 11 21:07:33 2016, dmrz@illinois.edu wrote: > Hi Thomas, > > I like your parameter solution very much, and I confirm that the patch > successfully prevents the problem scenarios I originally reported -- > thanks! > > However, there's a variant scenario whose behavior still seems wrong: Hi David, Thank you for testing the patch. We shall look into the scenario you described soon but as we may take several days (or more) to get to it I wanted to acknowledge your message sooner. regards, shawn
Hello David: Attached is a patch which should correct the issue. We would like to recognize your contribution in our release notes. Please let us know if the patch works adn how you would like to be identified in the release notes. Sincerely, Thomas Markwalder ISC Software Engineering
Subject: 41815a.patch

Message body not shown because it is not plain text.

Subject: Re: [ISC-Bugs #41815] issues reclaiming abandoned leases
Date: Mon, 1 Aug 2016 16:30:11 -0500
To: dhcp-review@isc.org
From: "David Zych" <dmrz@illinois.edu>
On 07/28/2016 06:12 AM, Thomas Markwalder via RT wrote: > Please let us know if the patch works Looks great; my test cases behave perfectly now. > and how you would like to be identified in the release notes. David Zych at University of Illinois Thanks again for the great work on this! David P.S. Do you have a rough idea of when 4.3.5 is likely to be released? (I'm weighing the relative merits of waiting for the official release vs going ahead and using the patches now)