From fanf2@hermes.cam.ac.uk Thu Mar 10 12:55:03 2016 CC: "Tony Finch" MIME-Version: 1.0 X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,T_RP_MATCHES_RCVD autolearn=ham autolearn_force=no version=3.4.0 X-Cam-Antivirus: no malware found content-type: TEXT/PLAIN; charset="utf-8" Message-ID: X-X-Sender: fanf2@hermes-2.csi.cam.ac.uk Received: from mx.ams1.isc.org (mx.ams1.isc.org [IPv6:2001:500:60::65]) by bugs.isc.org (Postfix) with ESMTP id B80EA71B5A8 for ; Thu, 10 Mar 2016 12:55:02 +0000 (UTC) Received: from ppsw-33.csi.cam.ac.uk (ppsw-33.csi.cam.ac.uk [131.111.8.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx.ams1.isc.org (Postfix) with ESMTPS id 49C571FCBB1 for ; Thu, 10 Mar 2016 12:55:00 +0000 (UTC) Received: from hermes-2.csi.cam.ac.uk ([131.111.8.54]:36674) by ppsw-33.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.157]:25) with esmtpa (EXTERNAL:fanf2) id 1ae07H-000qI7-iM (Exim 4.86_36-e07b163) (return-path ); Thu, 10 Mar 2016 12:54:59 +0000 Received: from fanf2 by hermes-2.csi.cam.ac.uk (hermes.cam.ac.uk) with local id 1ae07H-0002yN-NL (Exim 4.72) (return-path ); Thu, 10 Mar 2016 12:54:59 +0000 Delivered-To: bind9-bugs@bugs.isc.org Subject: Running out of ephemeral TCP ports User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) Return-Path: X-Original-To: bind9-bugs@bugs.isc.org Sender: "Tony Finch" Date: Thu, 10 Mar 2016 12:54:59 +0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mx.ams1.isc.org To: bind9-bugs@isc.org X-Cam-Scannerinfo: http://www.cam.ac.uk/cs/email/scanner/ From: "Tony Finch" X-RT-Original-Encoding: ascii X-RT-Interface: Email Content-Length: 1725 I have a basic health check script on my recursive servers: #!/bin/sh digarg="+time=1 +tries=1 +short cam.ac.uk in loc" digout='52 12 19.000 N 0 7 5.000 E 18.00m 10000m 100m 100m' for host in 127.0.0.1 ::1 do for proto in +ignore +tcp do case $(dig @$host $proto $digarg) in ($digout) : ok ;; (*) exit 1 ;; esac done done exit 0 I am running a load test using adns-masterfile as described at http://fanf.livejournal.com/141030.html This load test involves one client using one UDP socket and one TCP socket. The client is running on a different machine connecting over a LAN. The server is running Linux 3.13.0-77-generic #121-Ubuntu BIND 9.10.3-P4+0-large built by make with '--enable-threads' '--enable-getifaddrs' '--with-ecdsa=yes' '--with-geoip=no' '--with-gost=no' '--with-gssapi=no' '--with-idn=no' '--with-iconv=no' '--with-libjson=yes' '--with-libxml2=yes' '--with-openssl=yes' '--with-pkcs11=no' '--with-python=yes' '--with-readline=yes' '--with-tuning=large' '--prefix=/home/named/BIND/9.10.3-P4+0' '--mandir=/home/named/BIND/9.10.3-P4+0/man' '--localstatedir=/home/named/var' '--sysconfdir=/home/named/etc' The server accumulates a lot of completed TCP connections in TIME_WAIT. When `netstat -an | grep -c TIME_WAIT` gets over 28,000 then the health check script starts to fail, because `dig` cannot open a TCP connection - it fails with dig: isc_socket_bind: address in use I think this means that `dig` needs to use ISC_SOCKET_REUSEADDRESS and I suspect that `named` might need some attention in this area as well. Tony. -- f.anthony.n.finch http://dotat.at/ Viking, North Utsire: Southerly 5 to 7. Moderate or rough. Mainly fair. Good, occasionally poor.