Subject: | issues reclaiming abandoned leases |
Date: | Fri, 26 Feb 2016 18:57:09 -0600 |
To: | dhcp-bugs@isc.org |
From: | "David Zych" <dmrz@illinois.edu> |
In the course of some proof-of-concept testing, I've encountered some quite odd behavior in dhcp-4.3.3-P1 with regard to reclaiming abandoned leases.
I have two DHCP servers running dhcp-4.3.3-P1 compiled with --enable-log-pid --enable-delayed-ack --enable-binary-leases --enable-use-sockets
For purposes of this particular test, they are running a very simple failover configuration for a single pool of only two addresses (172.21.195.13 and 172.21.195.14), with ping-check true and ping-timeout 1 (full configs are attached). The router for the client subnet has ip helpers pointing to both DHCP servers.
To test abandoned lease behavior, I have statically configured "squatter" VMs to occupy both IPs in the range (and to continually ping an off-network target so they won't fall out of the router's ARP table).
First, I stop both servers, truncate their lease files, start them again, and wait for them to attain normal state:
...
2016-02-26T14:57:23.259278-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: I move from recover-done to normal
2016-02-26T14:57:23.259284-06:00 dhcp-dev1 dhcpd[39761]: balancing pool d7d170 172.21.195.0/28 total 2 free 2 backup 0 lts 1 max-own (+/-)0
2016-02-26T14:57:23.259286-06:00 dhcp-dev1 dhcpd[39761]: balanced pool d7d170 172.21.195.0/28 total 2 free 1 backup 1 lts 0 max-misbal 0
2016-02-26T14:57:23.262960-06:00 dhcp-dev1 dhcpd[39761]: Sending updates to DHCP-W.
2016-02-26T14:57:23.267516-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: peer moves from recover-done to normal
2016-02-26T14:57:23.267529-06:00 dhcp-dev1 dhcpd[39761]: failover peer DHCP-W: Both servers normal
Since the pool contains two leases, they balance with 1 free and 1 backup as expected.
Now, while running a packet capture on the primary, I start a client VM (in my case it happens to be Peppermint 6) which will attempt to obtain a lease using DHCP.
2016-02-26T14:59:58.529451-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T14:59:58.529992-06:00 dhcp-dev1 dhcpd[39761]: Abandoning IP address 172.21.195.14: pinged before offer
2016-02-26T14:59:58.529984-06:00 dhcp-dev1 dhcpd[39761]: ICMP Echo reply while lease 172.21.195.14 valid.
2016-02-26T14:59:59.537926-06:00 dhcp-dev1 dhcpd[39761]: balancing pool d7d170 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-own (+/-)0
2016-02-26T14:59:59.537943-06:00 dhcp-dev1 dhcpd[39761]: balanced pool d7d170 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-misbal 0
At 14:59:58, on the primary, we receive the client's first DHCPDISCOVER, but we ping our free lease 172.21.195.14 and get a reply, so we mark this lease abandoned and we do not send a DHCPOFFER (so far so good).
2016-02-26T15:00:01.568022-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.14.
2016-02-26T15:00:01.568039-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:01.568041-06:00 dhcp-dev1 dhcpd[39761]: DHCPOFFER on 172.21.195.14 to 08:00:27:19:62:e0 (p6test3) via 172.21.195.1
At 15:00:01, we attempt to reclaim abandoned lease 172.21.195.14 in order to answer the client's second DHCPDISCOVER, but we don't send another ping; instead we respond straightaway with a DHCPOFFER for the (still in fact unusable) IP.
** This is a bug AFAICT. We should ping-check the IP, find that it's still in use, and continue to not send a DHCPOFFER.
However, apparently on some level we do remain aware that there's a problem with this lease, because when the client DHCPREQUESTs it (in response to our DHCPOFFER), we respond with DHCPNAK:
2016-02-26T15:00:01.574843-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable.
2016-02-26T15:00:01.574858-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:04.167068-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable.
2016-02-26T15:00:04.167089-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:10.628289-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.14 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.14 unavailable.
2016-02-26T15:00:10.628313-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.14 to 08:00:27:19:62:e0 via 172.21.195.1
Meanwhile, the secondary peer (which ignored the first DHCPDISCOVER due to load balancing) tries to answer the second DHCPDISCOVER using the other [backup state] lease...
2016-02-26T14:59:58.528830-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: load balance to peer DHCP-W
2016-02-26T15:00:00.540043-06:00 dhcp-dev2 dhcpd[30781]: balancing pool 2843050 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-own (+/-)0
2016-02-26T15:00:00.540068-06:00 dhcp-dev2 dhcpd[30781]: balanced pool 2843050 172.21.195.0/28 total 2 free 0 backup 1 lts 0 max-misbal 0
2016-02-26T15:00:01.567524-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:01.567508-06:00 dhcp-dev2 dhcpd[30781]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: cancel load balance to peer DHCP-W - no free leases
2016-02-26T15:00:01.567919-06:00 dhcp-dev2 dhcpd[30781]: Abandoning IP address 172.21.195.13: pinged before offer
2016-02-26T15:00:01.567912-06:00 dhcp-dev2 dhcpd[30781]: ICMP Echo reply while lease 172.21.195.13 valid.
but gets a ping reply and (correctly) marks this lease abandoned.
At this point, our client backs off for about 13 seconds. When the client later tries another DHCPDISCOVER, the primary now attempts to reclaim 172.21.195.13 (previously marked abandoned by the secondary):
2016-02-26T15:00:23.948174-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.13.
2016-02-26T15:00:23.948192-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:23.948637-06:00 dhcp-dev1 dhcpd[39761]: Abandoning IP address 172.21.195.13: pinged before offer
2016-02-26T15:00:23.948630-06:00 dhcp-dev1 dhcpd[39761]: ICMP Echo reply while lease 172.21.195.13 valid.
This time the primary behaves as it should; after reclaiming 172.21.195.13, we ping-check it first and (upon receiving a reply) immediately re-abandon the lease without ever sending a DHCPOFFER.
A few seconds later, however, we get a fourth DHCPDISCOVER, and the primary now repeats its wrong behavior from the second DHCPDISCOVER: it reclaims the lease it had just abandoned, immediately DHCPOFFERs it without ping-checking first, but then DHCPNAKs the client's attempt to actually DHCPREQUEST it.
2016-02-26T15:00:27.309284-06:00 dhcp-dev1 dhcpd[39761]: Reclaiming abandoned lease 172.21.195.13.
2016-02-26T15:00:27.309301-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:27.309303-06:00 dhcp-dev1 dhcpd[39761]: DHCPOFFER on 172.21.195.13 to 08:00:27:19:62:e0 (p6test3) via 172.21.195.1
2016-02-26T15:00:27.376495-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable.
2016-02-26T15:00:27.376517-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:30.612954-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable.
2016-02-26T15:00:30.612976-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1
2016-02-26T15:00:36.302706-06:00 dhcp-dev1 dhcpd[39761]: DHCPREQUEST for 172.21.195.13 (128.174.204.220) from 08:00:27:19:62:e0 via 172.21.195.1: lease 172.21.195.13 unavailable.
2016-02-26T15:00:36.302742-06:00 dhcp-dev1 dhcpd[39761]: DHCPNAK on 172.21.195.13 to 08:00:27:19:62:e0 via 172.21.195.1
Finally, after another brief interval, the client sends a fifth DHCPDISCOVER. This time we don't attempt to reclaim an abandoned lease at all:
2016-02-26T15:00:48.743496-06:00 dhcp-dev1 dhcpd[39761]: DHCPDISCOVER from 08:00:27:19:62:e0 via 172.21.195.1: network 172.21.195.0/28: no free leases
The final version of dhcpd.leases from the primary, also attached, shows both leases being abandoned for a time, but then ending up in free and backup states.
** I can't say with confidence that this is a bug, but it's quite confusing to me; AFAICT since the squatters are still there and responding to pings, the leases should end up as abandoned.
(note: some additional trials while watching `tail -f dhcpd.leases` give the subjective impression that dhcpd is freeing the leases spontaneously after a short time has passed, i.e. *not* as part of a normal reclaim attempt in response to a received DHCPDISCOVER, which makes me even more confused.)
Also attached is the traffic capture from the primary, filtered with ip.addr==128.174.204.220 && (icmp || udp.port==67 || tcp.port==647), which clearly shows pings being sent after the first and third DHCPDISCOVERs but not after the other ones.
As a variation on the theme, I realized *after* writing most of this up (so I didn't want to start over) that the strange behavior is not actually unique to failover operation. Stopping dhcp-dev2 entirely, commenting out all the failover-related bits of dhcp-dev1's dhcpd.conf, and restarting dhcp-dev1 again with an empty leases file, I was able to observe the same general pattern of previously abandoned leases being inappropriately DHCPOFFERed without a fresh ping-check, and spontaneously being freed. Best/worst of all, it appears that on one occasion the spontaneous-free occurred *in between* the inappropriate DHCPOFFER and the subsequent DHCPREQUEST, with the result that instead of a NAK the server ACKed the lease to the client (thereby precipitating a real live IP conflict!). My travelogue for this test run is the attached nofailover.txt.
Second variation: single DHCP server, no failover, with only one "squatter" VM (the other IP is legitimately available). First DISCOVER marks the first IP abandoned due to ping-check, second DISCOVER successfully and legitimately acquires a lease on the second IP, no reclaim attempts are necessary because no other DISCOVERs arrive -- but the first IP is nonetheless mysteriously updated to free in the dhcpd.leases file about 2 minutes later.
Thanks,
David
P.S. Prior to the nofailover run, I had also removed the *-lease-time 600 statements within the subnet, which is why the eventual active lease was issued for a full day duration instead of 10 minutes. This is not at all important, except to disarm a potential red herring.
P.P.S. With the possible exception of the apparent race condition on the nofailover run, I've found all of this very easy to reproduce in my environment; hopefully you will as well, but let me know if there's anything further I can do to help.
--
David Zych
Lead Network Service Engineer
University of Illinois Technology Services
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body not shown because it is not plain text.
Message body is not shown because sender requested not to inline it.