Issue #1095 wontfix

Patches to improve large file upload performance

Anonymous created an issue

We have a server which frequently receives file / json upload POSTs, with bodies up to half a gigabyte. We've found the server performance in processing the body is limited, especially for a POST multipart mime body without 'content-length' header for the individual parts.

Based on our profiling, there's two problems at least in CherryPy 3.1.2 we'd like to see fixed; I'm attaching to this ticket our patches, one to add a test to measure the performance, and two to fix the issues below.

The first issue is that the default network read size is small, 8 kB for the initial request line and headers, and 64 kB to read a body with content-length. For a several hundred megabyte file, that is too small. We added a configuration option request.body_io_size to allow the size to be tuned, for large uploads 128 - 512 kB seems to be a more reasonable value.

The second issue is that CP_fileobject read() and readline() implementations cause a very large amount of cStringIO object churn, creating a new object for every line of input read. For some files we read it typically breaks input to just a couple of hundred bytes to a kilobyte at a time. We found that returning just slices from an internal buffer - sized by body_io_size parameter - works much better. Although performance for a body without content-length header is still poor (~15 MB/s) compared to one that does (up to 300 MB/s), it's ~50% better than without the patch.

We added a test which demonstrates the problems. You can run the basic test with simply 'python test/test_upload.py', or with --server/--client options to run it between machines. To run a basic single-host benchmark please run it like this, with and without the other patches to measure the impact:

{{{

!sh

for x in smart dumb; do T=$(for i in $(seq 1 10); do echo UploadTest.test_big_$x; done) (set -x; python test/test_upload.py $=T) done }}}

Note that because FieldStorage spools bodies larger than ~1kB to disk, the above will be writing 500 MB temporary files to disk, so how fast it goes may depend on how much RAM you have and on the device for temporary files.

On a linux server we see 10 'smart' uploads go from about 40 seconds and 150 MB/s over localhost to 20 seconds and 300 MB/s over localhost. The 10 'dumb' ones go from about 470 seconds and 10 MB/s to 350 seconds and 14 MB/s. On a mac laptop the smart uploads are about half that speed, dumb ones the same.

Reported by lat@cern.ch

Comments (3)

  1. Robert Brewer

    Thanks for the report. I highly recommend you look at CP 3.2, in which the entire request body processing model has been reworked, eliminating the use of `FieldStorage` and the stdlib `cgi` module completely. It may still have small chunk sizes (wink) but at the very least, you have the facility in 3.2+ to read() directly from the `wsgi.input` stream and handle it as you see fit.

  2. Anonymous

    Thanks! Unfortunately we don't plan to switch to CP 3.2 for the time being, but we'll look into this again when we do.

  3. Log in to comment