CC: | "Tony Finch" <dot@dotat.at> |
Subject: | Running out of ephemeral TCP ports |
Date: | Thu, 10 Mar 2016 12:54:59 +0000 |
To: | bind9-bugs@isc.org |
From: | "Tony Finch" <dot@dotat.at> |
I have a basic health check script on my recursive servers:
#!/bin/sh
digarg="+time=1 +tries=1 +short cam.ac.uk in loc"
digout='52 12 19.000 N 0 7 5.000 E 18.00m 10000m 100m 100m'
for host in 127.0.0.1 ::1
do
for proto in +ignore +tcp
do
case $(dig @$host $proto $digarg) in
($digout) : ok ;;
(*) exit 1 ;;
esac
done
done
exit 0
I am running a load test using adns-masterfile as described at
http://fanf.livejournal.com/141030.html
This load test involves one client using one UDP socket and one TCP
socket. The client is running on a different machine connecting over a
LAN.
The server is running Linux 3.13.0-77-generic #121-Ubuntu
BIND 9.10.3-P4+0-large <id:03b54c5> built by make with '--enable-threads' '--enable-getifaddrs' '--with-ecdsa=yes' '--with-geoip=no' '--with-gost=no' '--with-gssapi=no' '--with-idn=no' '--with-iconv=no' '--with-libjson=yes' '--with-libxml2=yes' '--with-openssl=yes' '--with-pkcs11=no' '--with-python=yes' '--with-readline=yes' '--with-tuning=large' '--prefix=/home/named/BIND/9.10.3-P4+0' '--mandir=/home/named/BIND/9.10.3-P4+0/man' '--localstatedir=/home/named/var' '--sysconfdir=/home/named/etc'
The server accumulates a lot of completed TCP connections in TIME_WAIT.
When `netstat -an | grep -c TIME_WAIT` gets over 28,000 then the health
check script starts to fail, because `dig` cannot open a TCP connection -
it fails with
dig: isc_socket_bind: address in use
I think this means that `dig` needs to use ISC_SOCKET_REUSEADDRESS and I
suspect that `named` might need some attention in this area as well.
Tony.
--
f.anthony.n.finch <dot@dotat.at> http://dotat.at/
Viking, North Utsire: Southerly 5 to 7. Moderate or rough. Mainly fair. Good,
occasionally poor.