1. cherrypy
  2. CherryPy
Issue #691 resolved

Repeated Ctrl-C hangs wsgiserver

Anonymous created an issue

Ever since I have gotten cherrypy3, I've had problems with the cherrypy process hanging in the shell after a CTRL-C is sent to process. My only resort would be to send a SIGTERM or SIGKILL.

I was looking around in the code and came up with this:

When CTRL-C is sent to to the cherrypy process, each connection thread with an active connection waits until until a request does not return a body or close_connection on the request is set to True ({{{wsgiserver/init.py:611}}}). The forever loop here implements HTTP/1.1 and Keep-Alive. with the close_connection attribute on the HTTPRequest.

Therefore, there is a noticable time lag between when a KeyboardInterrupt is sent and all of the worker threads are able to process their SHUTDOWN request (because conn.communicate is blocking) {{{wsgiserver/init.py:693}}}.

While the loop is spinning on the Connection's communicate method, sending another CTRL-C to the process causes the thread to abort and the callstack bubble up out of the thread join. At this point, the only recourse is to manually SIGTERM.

It would appear that since the shutdown requests are put in a worker queue, there currently isn't any way to give the HTTPConnection a flag to say stop processing requests.

I suppose that the best thing to do would allow the KeyBoardInterrupts to not lock the process during the shutdown process.

A fix I came up with is adding KeyboardInterrupt to the thread join loop in {{{wsgiserver/init.py}}} near line 986

Go from

{{{

!python

while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join() except AssertionError: pass }}}

to

{{{

!python

while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: worker.join() except (AssertionError, KeyboardInterrupt): pass }}}

This at least allows the process to shutdown gracefully after multiple CTRL-C's.

To reproduce this situation (test system Ubuntu Feisty):

test file: {{{

!python

broke.py

import cherrypy

class Root(object): @cherrypy.expose def index(self): return 'hi'

cherrypy.quickstart(Root())

$python broke.py }}} <open localhost:8080 in web browser to activate a connection> tap CTRL-C twice.

However, I wish that I could figure out how to interrupt the connection loop when a shutdown is occuring, since Keep-Alive could keep the server in the shutdown state for an arbitrarily long time while multiple requests from the same client.

Reported by jlowery

Comments (17)

  1. Robert Brewer

    Excellent work! I've been aware of this for some time and just haven't had the time to really nail it down. Thanks for the patch!

    Re: "interrupt the connection loop"...I think you're talking about the "while" loop inside HTTPConnection.communicate? That usually waits on rfile.readline() inside parse_request, and *that* is waiting on socket._fileobject._sock.recv(). I wonder what happens if we call rfile._sock.close() in the main thread while another thread is waiting on recv()?

  2. Anonymous

    Well, I tested this out. .close() doesn't work, but shutdown() does. I'm still learning the threading internals of cherrypy, so my solution here may have thread-related side-effects. I also think this fix is kind of kludgy. Maybe you know a better way to fit it into the design.

    All of these changes are in wsgiserver/__init__.py

    Modify the WorkerThread class thusly (minus my debug print lines of course):

    class WorkerThread(threading.Thread):
        """Thread which continuously polls a Queue for Connection objects.
    
        server: the HTTP Server which spawned this thread, and which owns the
            Queue and is placing active connections into it.
        ready: a simple flag for the calling server to know when this thread
            has begun polling the Queue.
    
        Due to the timing issues of polling a Queue, a WorkerThread does not
        check its own 'ready' flag after it has started. To stop the thread,
        it is necessary to stick a _SHUTDOWNREQUEST object onto the Queue
        (one for each running WorkerThread).
        """
        conn = False
    
        def __init__(self, server):
            self.ready = False
            self.server = server
            threading.Thread.__init__(self)
    
        def run(self):
            try:
                self.ready = True
                while True:
                    conn = self.conn = self.server.requests.get()
                    #print 'handling request on thread %s with conn %s' % (self, conn)
                    if conn is _SHUTDOWNREQUEST:
                        return
    
                    try:
                        #print 'thread %s communicating' % self
                        conn.communicate()
                        #print 'thread %s end communicate' % self
                    finally:
                        conn.close()
            except (KeyboardInterrupt, SystemExit), exc:
                self.server.interrupt = exc
    

    In the CherrypyWSGIServer:stop method, change:

        # Must shut down threads here so the code that calls
        # this method can know when all threads are stopped.
        for worker in self._workerThreads:
            self.requests.put(_SHUTDOWNREQUEST)
    

    to

        # Must shut down threads here so the code that calls
        # this method can know when all threads are stopped.
        for worker in self._workerThreads:
            self.requests.put(_SHUTDOWNREQUEST)
            if worker.conn and not worker.conn.rfile.closed:
                worker.conn.rfile._sock.shutdown(_socket.SHUT_RDWR)
    

    also need to import _socket in the top imports.

    One side effect of this is that it is possible for the socket calls under the communicate() method to throw a bad file descriptor exception. I was able to generate one, but it did not hang the process and everything shut down properly. (Just got an exception occured in WorkerThread message).

    In effect, this kills the worker thread dead in its tracks.

    Hitting CTRL-C stops the process instantly, although this should probably be tested on win32 as winsock can be a PITA sometimes.

  3. Robert Brewer

    I can test on win32...I'll try to fold this into test_states so we can test on multiple platforms more easily, and get the regression benefits.

    We should give the deployer a "server.shutdown_timeout" config entry--setting it to zero makes SIGTERM or Ctrl-C stop the process instantly, setting it to a positive float gives existing connections N seconds to complete before being forcibly shut down. That should allow a reasonable way to shutdown gracefully--if you need immediate forced shutdown, there's always SIGKILL.

  4. Anonymous

    Makes sense. For the timeout mechanism. Do you think the best way would be to write a Shutdown thread that is launched in stop() that time.sleep() before killing all of the sockets?

  5. Robert Brewer

    It looks like we can just call worker.join(timeout=self.shutdown_timeout):

    "When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call isAlive() to decide whether a timeout happened."

        current = threading.currentThread()
        while self._workerThreads:
            worker = self._workerThreads.pop()
            if worker is not current and worker.isAlive:
                try:
                    worker.join(self.shutdown_timeout)
                    if worker.isAlive:
                        # We exhausted the timeout. Forcibly shut down the socket.
                        c = worker.conn
                        if c and not c.rfile.closed:
                            c.rfile._sock.shutdown(_socket.SHUT_RDWR)
                # except (AssertionError, KeyboardInterrupt):
                except AssertionError:
                    pass
    
  6. Anonymous

    Ok. I gave your timeout to join solution a try and it works well. Suppose can just disregard the shutdown thread solution.

  7. Anonymous

    Just tested and on Ubuntu if you don't have the KeyboardInterrupt except, the process still hangs on a non-zero shutdown_time with multiple CTRL-C's.

  8. Robert Brewer

    The _SHUTDOWNREQUEST's are done first so that idle threads don't pick up new requests while other threads are being shut down. Moving those queue.put calls would make shutdown take a lot longer under heavy load.

  9. Robert Brewer
    > if you don't have the !KeyboardInterrupt except,
    > the process still hangs on a non-zero shutdown_time
    > with multiple CTRL-C's.
    

    Right; we need both. Can you produce a unified diff of all the changes so far, and attach it to this ticket? I'm still working on the test case.

  10. Anonymous

    I believe that the _sock.shutdown call would be needed to force an abort on a workerThread that is currently handling a client.

    A softer technique that could be used would be to modify the HTTPRequest and change close_connection to a property that always returns true when the server is in shutdown state (probably by setting a global flag or threading the state through the worker).

    This way, the communicate loop will finish up the last HTTP request that it was served before it closes the connection.

    The only exception here would be for long lived HTTP requests. I believe streaming is an example of where this wouldn't would.

  11. Anonymous

    Here is a svn diff from the head. I don't see anywhere in the interface for me to attach a file:

    Index: cherrypy/wsgiserver/init.py

    --- cherrypy/wsgiserver/init.py (revision 1658) +++ cherrypy/wsgiserver/init.py (working copy) @@ -47,6 +47,7 @@ quoted_slash = re.compile("(?i)%2F") import rfc822 import socket +import _socket try: import cStringIO as StringIO except ImportError: @@ -673,7 +674,8 @@ it is necessary to stick a _SHUTDOWNREQUEST object onto the Queue (one for each running WorkerThread). """ - + conn = None + def init(self, server): self.ready = False self.server = server @@ -683,7 +685,7 @@ try: self.ready = True while True: - conn = self.server.requests.get() + conn = self.conn = self.server.requests.get() if conn is _SHUTDOWNREQUEST: return

    @@ -766,7 +768,7 @@ ssl_private_key = None

    def init(self, bind_addr, wsgi_app, numthreads=10, server_name=None, - max=-1, request_queue_size=5, timeout=10): + max=-1, request_queue_size=5, timeout=10, shutdown_timeout=3): self.requests = Queue.Queue(max)

    if callable(wsgi_app): @@ -790,6 +792,7 @@ self._workerThreads = []

    self.timeout = timeout + self.shutdown_timeout = shutdown_timeout

    def start(self): """Run the server forever.""" @@ -968,12 +971,20 @@

    1. Don't join currentThread (when stop is called inside a request). current = threading.currentThread() + + while self._workerThreads: worker = self._workerThreads.pop() if worker is not current and worker.isAlive: try: - worker.join() - except AssertionError: + worker.join(self.shutdown_timeout) + if worker.isAlive: + # We exhausted the timeout. + # Forcibly shut down the socket. + c = worker.conn + if c and not c.rfile.closed: + c.rfile._sock.shutdown(_socket.SHUT_RDWR) + except (AssertionError, KeyboardInterrupt): pass

    def populate_ssl_environ(self):

  12. Robert Brewer

    Fixed in [1660]. I had to add some special logic for SSL. I also added a join() even after the join(timeout), because the spec says stop MUST block until all threads are shut down. I also allowed a shutdown_timeout of None to mean "wait forever; don't shutdown sockets".

  13. Log in to comment