Render accuracy issues with moderate to high temporal samples on OpenCL

Issue #4 resolved
lx45803 created an issue

Take the attached edge.xml and sequence it with embergenome --sequence=edge.xml --loops 0 --loopframes 0 --interploops 1 --interpframes=159 > sequenced.xml.

Then render with

emberanimate --in=sequenced.xml --frame=0 --suffix=_gpu --opencl --sp
emberanimate --in=sequenced.xml --frame=0 --suffix=_cpu --sp

You'll notice that the OpenCL-rendered image has some glow in the darker areas of the image. As you increase the number of temporal samples, the rest of the image desaturates and the error areas get brighter, as though samples above some threshold are having their position calculated incorrectly, and increasing the number of temporal samples enlarges some range that pushes more samples past this threshold. I don't have a detailed understanding of how the flame algorithm works though, so this is a guess.

Some renderings I did with various --ts settings are shown here: http://imgur.com/a/fKECp

I don't see any notable differences between single- and double-precision renderings. I have no AMD hardware to test on.

OpenCL Info:
Platform 0: NVIDIA Corporation  NVIDIA CUDA  OpenCL 1.2 CUDA 8.0.0
Device 0: NVIDIA Corporation  GeForce GTX 960
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2
CL_DEVICE_LOCAL_MEM_SIZE: 49,152
CL_DEVICE_LOCAL_MEM_TYPE: 1
CL_DEVICE_MAX_COMPUTE_UNITS: 8
CL_DEVICE_MAX_READ_IMAGE_ARGS: 256
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 16
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1,073,741,824
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_GLOBAL_MEM_CACHE_TYPE: 2
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 128
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 131,072
CL_DEVICE_GLOBAL_MEM_SIZE: 4,294,967,296
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 65,536
CL_DEVICE_MAX_CONSTANT_ARGS: 9
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1,024
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1,024, 1,024, 64

Comments (4)

  1. Matt Feemster repo owner

    Long story short, this is an artifact of using the GPU. Instead of 4 or 8 threads with independent point trajectory like the CPU uses, the GPU is using 32,768 independent threads. This can lead to the image looking more "spread out".

    One thing to keep in mind is that different parameters matter for different flames, so you have to spend time tweaking each one to get it just right. Further, what looks "right" is also subjective.

    For this case, even without animating, I can tell the outputs of a CPU vs. GPU image looks different. For example in the first flame in the file, on the CPU you can see a wide darkish black line starting from the bottom left corner of the image. When using the GPU, it's much less pronounced because the points seem to be more "spread out" and less sharp.

    Here is a case where a parameter that usually makes no difference in output, actually matters: fuse count. It's normally 15, but for this case, set it to 1000. This will cause the image to be less spread out, and more sharp.

    Next, when you're rendering animations with the GPU, be sure to use a high quality, say 5000 or so.

    If you do those two, the animation should look much more similar to the CPU.

    Also, experiment with temporal samples 100, 200, 500 and 1000 to see what looks good to you. Once no slicing is visible, there's no need to go higher.

    Also, try changing the temporal filter type and width. You are using a width of 1.2 and a type of Box. Play with gaussian or exp. It's easy to do this for all flames in the editor by checking "Apply All" when editing.

    BTW, I would recommend doing these tests with a small sample of frames, like 20, rather than the 159 you were using, just to make testing go quicker.

    Please try and let me know.

  2. lx45803 reporter

    Matt, is there a set of certain xforms or parameters that always render noticeably differently on GPU vs CPU? Or is it a more complex case like certain xforms with attributes outside this range in combination with this other parameter?

    Essentially, is there a way to determine which flames will render differently on GPU without actually rendering them to compare?

  3. Matt Feemster repo owner

    I've made every effort to make the results identical. The good news is that for 99% of cases, they are the same. There are two known variations that will for sure give different outputs on CPU/GPU:

    http://fractorium.com/?article=cpu-vs-gpu-results

    Also keep in mind that there are some variations that look different between SP and DP. I don't have a comprehensive list of those, so you just have to try them out. But as long as you use the same precision on CPU/GPU then you should see the same results. Tips on using precision:

    http://fractorium.com/?article=precision

    As you mentioned there are other cases where most of the time, a particular variation will look the same, but in certain contexts it will look different. For those, you'll just have to use trial and error. If you need a second set of eyes on a particular set of parameters, feel free to post them as an issue here or email them to me. The good news is that such cases are quite rare.

    As for working on this, I've been very busy with other stuff so I have not had time to get started. I will take a look in a few weeks or so.

  4. Log in to comment