Patches to improve large file upload performance
We have a server which frequently receives file / json upload POSTs, with bodies up to half a gigabyte. We've found the server performance in processing the body is limited, especially for a POST multipart mime body without '
content-length' header for the individual parts.
Based on our profiling, there's two problems at least in CherryPy 3.1.2 we'd like to see fixed; I'm attaching to this ticket our patches, one to add a test to measure the performance, and two to fix the issues below.
The first issue is that the default network read size is small, 8 kB for the initial request line and headers, and 64 kB to read a body with
content-length. For a several hundred megabyte file, that is too small. We added a configuration option
request.body_io_size to allow the size to be tuned, for large uploads 128 - 512 kB seems to be a more reasonable value.
The second issue is that
CP_fileobject read() and readline() implementations cause a very large amount of
cStringIO object churn, creating a new object for every line of input read. For some files we read it typically breaks input to just a couple of hundred bytes to a kilobyte at a time. We found that returning just slices from an internal buffer - sized by
body_io_size parameter - works much better. Although performance for a body without content-length header is still poor (~15 MB/s) compared to one that does (up to 300 MB/s), it's ~50% better than without the patch.
We added a test which demonstrates the problems. You can run the basic test with simply '
python test/test_upload.py', or with
--server/--client options to run it between machines. To run a basic single-host benchmark please run it like this, with and without the other patches to measure the impact:
for x in smart dumb; do T=$(for i in $(seq 1 10); do echo UploadTest.test_big_$x; done) (set -x; python test/test_upload.py $=T) done }}}
Note that because
FieldStorage spools bodies larger than ~1kB to disk, the above will be writing 500 MB temporary files to disk, so how fast it goes may depend on how much RAM you have and on the device for temporary files.
On a linux server we see 10 'smart' uploads go from about 40 seconds and 150 MB/s over localhost to 20 seconds and 300 MB/s over localhost. The 10 'dumb' ones go from about 470 seconds and 10 MB/s to 350 seconds and 14 MB/s. On a mac laptop the smart uploads are about half that speed, dumb ones the same.
Reported by firstname.lastname@example.org