__russ__  committed 1cc8e85

resolved root cause of some test performance issues with thousands of persistent connections (listen backlog issues) and added a lto of docs to explain them
- future improvement possible with additional accepting loop
test_rawtcp cleanup

  • Participants
  • Parent commits fd245c6
  • Branches default

Comments (0)

Files changed (2)

File cheroot/

                 if self._interrupt_from_worker:
                     exc = self._interrupt_from_worker
                     if isinstance(exc, (KeyboardInterrupt, SystemExit)):
-                        #Would prefer to jsut break out of the loop here, but
+                        #Would prefer to just break out of the loop here, but
                         #older code raised the exception so that is being kept
                         #TODO: is this required?  Why not always have safe_start-like behaviour?

File cheroot/test/

-"""Tests for TCP handling *only* (ie: nothing to do with HTTP whatsoever)."""
+"""Tests for TCP handling *only* (ie: nothing to do with HTTP whatsoever).
+If you try and run many persistent sockets, you will likely run into a limit
+on the maxium number of open file descriptors for a process. This is often
+only 1024.
+In Linux:
+* Check your current process/shell limit with 'ulimit -n'
+* Check the max value you can easily bump it to with 'ulimit -Hn'
+* Set the current (soft) value to the hard limit: 'ulimit -Sn `ulimit -Hn`'
+* if your hard limit is not sufficient, only root can raise the hard limit,
+  but if/when you do this you can su back to an unprivileged user and the
+  settings will stick.
+* You may also run into an overall fd limit on your system.  This is read/set
+  through /proc/sys/fs/file-max
+* for more info, see
+  or google your heart out.
+In Windows:
+* dunno, but rumour has it that the windows limit is really (?) high.
+* Not relevant yet anyway, since only select.poll is in use for recycle_threads,
+  and Windows doesn't support that.
+HIGH = trying to make >~100 simultaneous connections in a fast loop.
+In Linux:
+* You may be running into an issue where the listen() backlog is too small.
+  The poll->accept() loop can have issues cleaning this out at a blazing speed.
+  The impact is (depending on system) that the client's connection request is
+  refused, and the underlying TCP stack enters into retry mode.  On the machine
+  I'm typing this on, this means **3 seconds** of delay before it tries again,
+  and this only appears as (at first) inexplicable delays in the connection
+  attempt, while the server merrily waits for a connection and never sees one.
+* To test massive burst rates on connections, you may need to bump the listen
+  backlog to handle this (listen(backlog = x)).
+* THEN you may run into the fact that this backlog is silently truncated to the
+  value in:
+     /proc/sys/net/core/somaxconn
+  As root, you can overwrite this.
+* With this limit at (or near) the connection count, as well as having enough
+  available files per above, I have successfully run 30,000 simultaneously
+  connecting and persistent connections, with only 10 worker threads.  Even
+  more than 30,000 is possible, but I've not tested higher.
+* A future design change (specific loop for collecting connections with
+  accept() and stashing them) can address this "limitation", if it ends up
+  being a real-world problem.
+In Windows:
+* Not relevant yet, but the rumour is that the listen backlog may be limited
+  to as low as 5.  Testing needs to be done to figure out if it is needed
+  when a Windows version is available.
 import unittest
 import struct
 from cheroot import server
 from cheroot.test import helper
-ECHO_SERVER_ADDR = ('', 54583 + 1)
+NUM_PERSISTENT_CONNS = 100  #See notes at start before raising this
+ECHO_SERVER_ADDR = ('', 54583)
 PACK_FMT = "@I" #native unsigned int (go native since same machine ensured)
 INT32_LEN = 4
         #but may want to do something with the int at some point, so
         #unpack/repack it is.
         self._rx = recv_int32(self.conn.socket)
     def respond(self):
         self.conn.socket.sendall(struct.pack(PACK_FMT, self._rx))
+        pass
 class Int32Echo_Connection(server.TCPConnection):
     RequestHandlerClass = Int32Echo_RequestHandler
                            ssl_adapter = ssl_adapter,
                            recycle_threads = recycle_threads,
+    #With accept() being in the poll() loop, I ran into problems with the
+    #server not being able to clean out the pending connections queue/backlog
+    #fast enough... at least when hammering it with hundreds/thousands of
+    #simulataneous connection requests. What is causing this "slowness" is
+    #unknown at the moment. To resolve this, the listen backlog (set by
+    #request_queue_size) needs to be jacked up SIGNIFICANTLY if you want to
+    #slam the server with zillions of simultaneous connections. You pretty
+    #much need the backlog to be teh same size as the connection attempts if
+    #you have no delay between connections. THEN, when you try and do this,
+    #you will (at least on linux) run into the fact that listen() silently
+    #truncates the backlog to (typically) 128, even if you set it higher. To
+    #break this limit on linux, you need to adjust the truncation value in
+    #/proc/sys/net/core/somaxconn to match what you want. Otherwise, you get
+    #huge stutters in your connections due to TCP retry timing, which is a
+    #whopping 3 seconds on my current machine, and I've seen 1s elsewhere
+    #(unsure where
+    #...or just stick with 128 (a common linux trunctaion value) and test with
+    #100 persistent connections
+    #TODO: in future, another loop layer (for doing accept() ONLY and stashing
+    #the connections) could eliminate ths problem, although it probably is not
+    #a real workd problem to begin with.
     #Jack up the listening request queue size because we are going to hit it
-    #hard. If left at the default size of 5, large numbers of simulataneous
-    #connections stutter badly.
-    srv.request_queue_size = 500
+    #hard. If left at the default size of 5, large numbers of simultaneous
+    #connections stutter badly (see above).
+    srv.request_queue_size = max(128, NUM_PERSISTENT_CONNS)
     #Launch the server on a thread...
     slt = threading.Thread(target = srv.safe_start)
         self._sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    #from cheroot.profileit import profileit
+    #@profileit("get_echo.profile")
     def get_echo(self, x):
-        """Send x (an int32) to the echo server nd get it back"""
+        """Send x (an int32) to the echo server and get it back"""
         #no checking on x.  Just let struct.pack do it.
         self._sock.sendall(struct.pack(PACK_FMT, x))
         ret = recv_int32(self._sock)
     #TODO: integrate in any missing essentials, such as code coverage tricks
-    def _test_500PersistentConnections(self):
+    #from cheroot.profileit import profileit
+    #@profileit("hammer%d.profile" % NUM_PERSISTENT_CONNS)
+    def _echo_hammer(self, client_dict):
+        n = len(client_dict)
+        st = time.time()
+        call_durations = []
+        for i, client in client_dict.iteritems():
+            call_start = time.time()
+            echo = client.get_echo(i)
+            call_durations.append(time.time() - call_start)
+            self.assertEqual(echo, i)
+            #if i % 100 == 0:
+                #print "Echo complete for %d/%d clients" % (i, n)
+            #print "client %d worked!" % i
+        return call_durations
+    #from cheroot.profileit import profileit
+    #@profileit("int32echo_test1000.profile")
+    def _test_MultiplePersistentConnections(self, client_count):
         #tests are run serially, so server must handle them all with all open.
         #Set the number of connections to test...
-        # - default is 500 since many systems limit to 1024
+        # - default is 100 since many systems silently limit the listen backlog
+        #   to 128.  See notes at start of file.
         # - Watch your max file descriptors here or get "too many files"
         #   - 'ulimit -n' on linux, increase with 'ulimit -Sn <value>' up to
-        #     hard limit of 'ulimit -Hn', at least without hoop jumping.
-        # - also make sure request_queue_size is large, or it stutters on
-        #   the connection slam we're about to do (but still works).
-        #   - seems to be a max here for stutter-free connections, at least on
-        #     the box this was coded on.
-        # - I have tested up to 30000 connections
-        n = 1500 #client count.
+        #     hard limit of 'ulimit -Hn' (at least without hoop jumping).  See
+        #     notes at start of file.
+        # - if you crank n way up and performance is bad, see the notes at the
+        #   start of the file :)
+        n = client_count
         clients = {} #k=index; v=client
         st = time.time()
         for i in xrange(n):
             clients[i + 1] = Int32EchoClient(ECHO_SERVER_ADDR, i)
             #time.sleep(0.002) #try and give the listener a break
-            print "%.3f -- Connected client %d" % (time.time() - st, i)
+            #if (i + 1) % 100 == 0:
+                #print "%.3f -- Connected client %d" % (time.time() - st, i + 1)
         #x=raw_input("Press enter!!")
         #at this point we will have n open connections.  Now exercise them.
-        st = time.time()
-        for i, client in clients.iteritems():
-            echo = client.get_echo(i)
-            self.assertEqual(echo, i)
-            print "%.3f -- Echo 1 complete for %d clients" % (time.time() - st, i)
-            #if i % 100 == 0:
-                #print "Echo 1 complete for %d/%d clients" % (i, n)
-            #print "client %d worked!" % i
+        call_durations = self._echo_hammer(clients)
+        #from cheroot.test import aplotter
+        #aplotter.plot(call_durations, plot_slope=False, x_size=160,y_size=60)
         #Exercise them again to prove that our persistent connections still work...
-        for i, client in clients.iteritems():
-            echo = client.get_echo(i)
-            self.assertEqual(echo, i)
-            #if i % 100 == 0:
-                #print "Echo 2 complete for %d/%d clients" % (i, n)
-            #print "client %d worked!" % i
+        self._echo_hammer(clients)
         #now politely close the clients down, rather than wait for gc
         for i, client in clients.iteritems():
             #print "client %d closed" % i
-    def test_500PersistentConnections(self):
-        #Set up 500 simultaneous/persistent connections and test them
+    def test_MultiplePersistentConnections(self):
+        #Set up simultaneous/persistent connections and test them
         srv = None
             srv = CreateAndStartInt32EchoServer(ssl_adapter=None,
-            self._test_500PersistentConnections()
+            self._test_MultiplePersistentConnections(NUM_PERSISTENT_CONNS)
             if srv:
-    def test_500PersistentConnections_SSL(self):
+    def test_MultiplePersistentConnections_classic(self):
+        #Set up simultaneous/persistent connections and test them
+        raise nose.SkipTest("'classic' cheroot will fail... need a good test here")
+        srv = None
+        try:
+            srv = CreateAndStartInt32EchoServer(ssl_adapter=None,
+                                                recycle_threads=False)
+            self._test_MultiplePersistentConnections(NUM_PERSISTENT_CONNS)
+        finally:
+            if srv:
+                srv.stop()
+    def test_MultiplePersistentConnections_SSL(self):
         raise nose.SkipTest("Need to make the connection SSL")
         srv = None
         ssl_adapter = helper.get_default_ssl_adapter()
             srv = CreateAndStartInt32EchoServer(ssl_adapter = ssl_adapter,
                                                 recycle_threads = True)
-            self._test_500PersistentConnections()
+            self._test_100PersistentConnections()
             if srv: