Parallel rendering?

Issue #38 closed
Eric Wolf created an issue

I did some research before putting this up, and I don’t know enough about gpu/opencl programming and this might not be possible, but I was wondering if there is a way to render multiple fractals at the same time?

Example-- I’m currently rendering an animation and it takes 1.24 seconds to render each frame, about 200 million iterations, while my gpu is running at 2 billion iterations per second. I imagine a lot of that time is the density filtering and final accumulation-- it seems like there is plenty of room to speed up the process of the iterating by running multiple at a time, and perhaps offload some of the other work to a separate cpu thread?

Comments (7)

  1. Matt Feemster repo owner

    I don’t know enough about gpu/opencl programming

    I’ll explain why what you described isn’t possible or doesn’t apply.

    Animations in Fractorium and the EmberAnimate command line program already support multi-gpu rendering. When animating, each frame gets its own GPU if you specify multiple devices.

    http://fractorium.com/?article=emberanimate

    http://fractorium.com/?page_id=434

    I imagine a lot of that time is the density filtering and final accumulation

    The inverse is true: most of the time is spent iterating. The second and third stage are fairly quick. The only time density filtering takes a long time is when you are using supersampling, and a fairly wide filter like 9+ and are using double precision.

    perhaps offload some of the other work to a separate cpu thread?

    The saving of the final image to disk does take some time, but I’ve already threaded that. The image data is copied to a separate buffer and passed to a thread which does the jpg/png compression and writes it to disk. So that is happening while your next frame starts rendering. However, I suppose this could be somewhat limited in that the thread must wait to complete before writing again for the next image. So in the case you had super fast renders, but very slow image writing, this could bottleneck. But that would be very rare, it would require large images that took a long time to save, but which rendered very fast, an unlikely scenario.

    There is also some overhead with starting and stopping the render for each frame, so that will always take some time.

    As for offloading more work to the CPU, that wouldn’t help at all. The main work is iterating, and saving to disk, and as you can see, those are already properly threaded.

    Last, there is OpenCL compilation. As you blend from one frame to another, several OpenCL recompiles will happen, and that will definitely lead to a slight pause. There is no way around this, I already try to limit recompiles to situations where it’s absolutely necessary.

    What image size are you rendering at? Try increasing it, or increasing the quality, that way the overhead won’t have as much of a relative effect.

  2. Matt Feemster repo owner

    If this is really vexxing you, you can provide me the animation sequence you’re trying to render, with a description/screenshot of the options used, and I can try to see where the time is being spent on my machine.

    What kind of GPU are you using?

  3. Eric Wolf reporter

    Thanks for the very detailed explanation! When I render animations, the quality of each frame isn’t a major concern, especially when I intend to play it back at 60fps. In this particular case, I’m rendering at 3840x2160 with quality set to 100. GPU is an AMD 5700XT

    Flame file

  4. Matt Feemster repo owner

    Ok, this adds more clarity, thanks. A few points.

    Only use DP if the output between DP and SP on your keyframes look different. If they look the same, then use SP. It’s much faster. Gaming cards completely cripple DP performance because they don’t need it.

    You should really use a higher quality. At least 1000, maybe 2000. If you’re doing a nice animation, you want it to look good.

    I could have swore I explained the following somewhere, but I scoured my website and don’t see it. So I’ll explain it here:

    When each frame is rendered, it’s actually blended with the next frame. The number of blend steps is the Temporal Samples value. What this means is that it will divide the total number of iterations required for your specified quality by the number of Temporal Samples. Then perform a render of that new, lesser quality, Temporal Samples times, and sum all of them to the same histogram and output a single image. This gives the smooth look. The problem with this, when using the GPU, is that you are starting and stopping Temporal Samples new renders. While this doesn’t matter too much on the CPU, it does incur significant overhead on the GPU. This is because each render is a new kernel start/stop. This is where the extra overhead is coming from in your performance numbers. For example in your case: quality 100, ts 100:

    Means for each frame, do 100 separate renders, each of 1 quality.

    One way you can reduce the relative overhead, is to render at a higher quality. It will still take longer, but the effect of the overhead will be less noticeable.

    Despite lowering performance, I would also suggest a higher temporal samples value, to avoid the occasional strobing effect when you don’t use enough, and you are then able to see each sample, which looks bad.

    So try doing your animation with these different settings and compare the results:

    Q: 100 TS: 100 (what you have now)

    Q: 500 TS: 500

    Q: 1000 TS: 500

    Q: 2000 TS: 500

    Q: 2000 TS: 1000

    I think electric sheep always uses Q: 2000 and TS: 1000

    For additional information, see here to understand the animation process: http://fractorium.com/?article=animation-overview

  5. Log in to comment