HTTPResponse should accept generator to make streaming available

Issue #3 new
Barosl Lee
created an issue

Streaming in WSGI is usually done by passing a generator function as a response. As HTTPResponse.write only accepts str and bytes, I tried this:

def hello():
    yield b'hello'
    time.sleep(1)
    yield b'world'

resp = wheezy.http.HTTPResponse()
resp.buffer = hello()
return resp

But this doesn't work, because HTTPResponse tries to calculate the length of the response, which is needed to emit the Content-Length header. So, I suggest:

  1. There needs to be some ways to disable emitting the header, by making an option, or by adding a subclass.
  2. HTTPResponse should also provide a method to replace its buffer. The direct assignment to HTTPResponse.buffer seems to be inconsistent with other methods it provides(write, write_bytes).

Comments (8)

  1. Barosl Lee reporter

    It seems that Werkzeug checks if the content is an iterator before setting Content-Length. The related code:

    #: Should this response object automatically set the content-length
    #: header if possible?  This is true by default.
    #:
    #: .. versionadded:: 0.8
    automatically_set_content_length = True
    
    @property
    def is_sequence(self):
        """If the iterator is buffered, this property will be `True`.  A
        response object will consider an iterator to be buffered if the
        response attribute is a list or tuple.
    
        .. versionadded:: 0.6
        """
        return isinstance(self.response, (tuple, list))
    
    def get_wsgi_headers(self, environ):
        ....
        if self.automatically_set_content_length and \
           self.is_sequence and content_length is None and status != 304:
            try:
                content_length = sum(len(to_bytes(x, 'ascii')) for x in self.response)
            except UnicodeError:
                # aha, something non-bytestringy in there, too bad, we
                # can't safely figure out the length of the response.
                pass
            else:
                headers['Content-Length'] = str(content_length)
    

    https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/wrappers.py#L1136

  2. Andriy Kornatskyy repo owner

    I believe HTTP response for streaming is somewhat unique and should not be generalized in HTTPResponse class. The user code is alway precise about doing regular response or streaming one, so I do not see a value in reuse of HTTPResponse and additional check for generator/iterator as WSGI response. Instead a separate response class is suggested that limit interface to only those operation that are applicable for streaming.

  3. Andriy Kornatskyy repo owner

    Initially I was thinking about something like this:

    class HTTPStreamingResponse(object):
        """ HTTP streaming response.
    
            Response header Cache-Control
            must not be set by user code directly. Use
            ``HTTPCachePolicy`` instead.
        """
        status_code = 200
        cache_policy = None
        cache_profile = None
    
        def __init__(self, iteratable, content_type='text/html; charset=UTF-8',
                     encoding='UTF-8'):
            """ Initializes HTTP response.
            """
            self.iteratable = iteratable
            self.content_type = content_type
            self.encoding = encoding
            self.headers = [('Content-Type', content_type)]
            self.cookies = []
    
        def __call__(self, start_response):
            """ WSGI call processing.
            """
            headers = self.headers
            append = headers.append
            cache_policy = self.cache_policy
            if cache_policy:
                cache_policy.extend(headers)
            else:
                append(HTTP_HEADER_CACHE_CONTROL_DEFAULT)
            if self.cookies:
                encoding = self.encoding
                for cookie in self.cookies:
                    append(cookie.http_set_cookie(encoding))
            start_response('200 OK', headers)
            return self.iteratable
    

    However later realized that it is unlikely to use status code other than 200, set cookies, use cache profile or policy.. so that reduced it down to this:

    class HTTPStreamingResponse(object):
        status_code = 200
        cache_profile = None
    
        def __init__(self, iteratable, content_type='text/html; charset=UTF-8'):
            self.iteratable = iteratable
            self.headers = [('Content-Type', content_type),
                            ('Cache-Control', 'private')]
    
        def __call__(self, start_response):
            start_response('200 OK', self.headers)
            return self.iteratable
    

    There is streaming demo here.

  4. Barosl Lee reporter

    It would be convenient if the user can pass a str(unicode) generator directly, rather than passing a bytes generator. To do that, something like

    return (chunk.encode(self.encoding) for chunk in self.iteratable)
    

    would be appropriate.

    If we do this, however, we should add an option to distinguish str iterators from bytes iterators, as pre-knowing the type of the generated data from the iterators is impossible. For example, in Werkzeug, applying the str->bytes encoding iterator is the default, and the user must set direct_passthrough=True to prevent this behavior. Personally, I think this 'str'-first approach is right, as the user more likely generates dynamic text content, except for some cases involving bytes(sending a file, for example).

    Second, I think there would be some use cases to set cookies from HTTPStreamingResponse, though it might be a rare case.

    And finally, the term "iterable" is more common than "iteratable" in the Python world. See: http://docs.python.org/3/library/itertools.html

  5. Andriy Kornatskyy repo owner

    I believe text and binary are two very distinct use case and we should avoid to mix them. Here is an example for HTTPTextStreamingResponse:

    class HTTPTextStreamingResponse(object):
        status_code = 200
        cache_profile = None
    
        def __init__(self, iterable, content_type='text/html; charset=UTF-8',
                     encoding='UTF-8'):
            self.iterable = iterable
            self.encoding = encoding
            self.headers = [('Content-Type', content_type),
                            ('Cache-Control', 'private')]
    
        def __call__(self, start_response):
            """ WSGI call processing.
            """
            start_response('200 OK', self.headers)
            return (chunk.encode(self.encoding) for chunk in self.iterable)
    
  6. Log in to comment