CEF3: Support rendering to a hardware GL/D3D texture/surface provided by the client

Issue #1006 new
Marshall Greenblatt
created an issue

Original issue 1006 created by magreenblatt on 2013-06-27T17:50:13.000Z:

CEF3 off-screen rendering does not currently support hardware acceleration. This means that some features like 3D CSS which require hardware acceleration do not currently work when using off-screen rendering.

See issue comment 51.8 for related comments.

Official response

Comments (41)

  1. Anonymous

    Comment 2. originally posted by hele@splitmedialabs.com on 2014-02-14T05:30:14.000Z:

    Hi , I just wanted to follow up on this issues. I remember we had some good discussions back in issue #518 about a fairly simple approach to support GPU acceleration for off screen rendering on Windows by using shared surfaces.

    Any idea on timeline here ?

  2. Anonymous

    Comment 3. originally posted by efrencd on 2014-02-19T19:11:57.000Z:


    here it is a very interesting article about the Desktop Window Manager in Windows Vista, 7 and 8.


    It seems in Windows Vista and beyond, all windows are actually textures that are rendered to an offscreen buffer and then put in a 3d surface (actually a D3D surface) representing that window.

    Taking that into account I guess there must be an easy way to get the whole content (harware accelerated) from CEF and use it as a texture so that we can embed it in our own apps (see awesomium.com)

    Probably this will only work in Windows Vista and beyond and not in Windows XP.
    I think CEF should support some of the features modern windows have to offer even if that breaks backwards compability with Win XP. I know many people love XP but we can't forget it is a 13 years old OS and many advancements have occurred since its release.

  3. Anonymous

    Comment 6. originally posted by jarred.nicholls on 2014-04-08T19:38:14.000Z:

    Amazing re: the HW accelerated off screen rendering! It really helps my use case with integrating CEF into QtQuick/QML scene graph without compromising.

  4. Marshall Greenblatt reporter

    Comment 7. originally posted by magreenblatt on 2014-06-28T00:10:37.000Z:

    There are currently two use cases for hardware acceleration:

    1. Support for accelerated features like 3D CSS and WebGL which require GPU acceleration for rendering and compositing.

    2. Support for delivering the final composited result to a GL/D3D texture/surface provided by the client in order to minimize copies and CPU load.

    comment 1. will be addressed by issue comment 12.57 which re-implements off-screen rendering support in trunk using the delegated renderer. comment 2. will be left to this issue.

  5. Anonymous

    Comment 12. originally posted by efrencd on 2015-03-01T13:39:33.000Z:

    Hello, any plans or aproximate date in which this feature will be added? thanks.

  6. Anonymous

    Comment 13. originally posted by ameggs on 2015-03-02T22:55:56.000Z:

    My (possibly incorrect or outdated?) understanding is that Chromium renders to GPU-side 256x256 pixel tiles, rather than a single buffer that's the full size of the visible browser window. When something on the page changes, only tiles that are changed get redrawn. At display time, something called the "ubercompositor" takes the current list of tiles and merges it with browser-side UI things. The only time the entire web page is drawn into a single buffer is when the tiles are combined with that UI for the final render. I'm basing this off these two documents; if I'm wrong I welcome the correction:



    But if my understanding is correct, then implementing this as described would mean chromium or CEF would be rendering all its tiles to a buffer we provided, and then we'd render that buffer into our own view. But if the goal here is to reduce latency and copies, maybe the best way to achieve this isn't for CEF to render to a texture/surface that we provide. Instead, just expose the current tile set to us. We can send those tiles into our own 3D renderer as part of our own independently-running update/draw/present loop.

    This would be especially appreciated by game developers, because we're frequently GPU-limited, and anything that reduces the number of GPU-side copies would help.

  7. Marshall Greenblatt reporter

    Samsung has (very indirectly) proposed a new API for hardware-accelerated off-screen rendering in CEF: http://blogs.s-osg.org/servo-the-countdown-to-your-next-browser-continues/ ("Composite Smarter, Not Harder" section).

    Also, using in-process-gpu in combination with a custom OutputSurface seems like a promising implementation approach. Related conversation: https://groups.google.com/a/chromium.org/d/msg/graphics-dev/cnr8n3mdRFc/2n7QrPAaAQAJ

  8. Cef Gluist

    Hi everybody.

    I've updated MGusarenko's patch to Cef rev 2272 / Chromium 41.

    It SHOULD work. A friend of mine is currently trying to build/integrate it and we should get more feedback soon. Otherwise just thought I'd drop this here, perhaps MGreenblatt or MGusarenko would be curious to have a quick look. Perhaps I've overlooked something.


  9. Emmanuel ROCHE

    Hello everyone,

    I've been watching for this issue since quite a long time now, and I was hoping someone else would find the time to handle it and save me the trouble... But well, I guess that's not how life works ;-).

    In my company, we have been using CEF quite intensively since more than 1 year now, and we reached a point where we really needed to squeeze out more performances out of this offscreen rendering mechanism.

    So a couple of months ago I decided I should stop waiting, try this myself and find a working solution (at least for us...). And now, I'm glad to annonce that, I finally made it (it was incredibly complex sure, but that's OK, because I learned a lot in the process, and now...): I have a working mechanism to render on a DirectX surface directly on the GPU (ie. without first copying the CEF rendered textures on a Skia bitmap on the CPU).

    Note that this work is based on the CEF branch 3163, which is not that old, so I think it should not be too hard to integrate it in the current version of CEF (if applicable ?).

    Of course, there are some limitations: the system I'm using is only designed for usage on Windows (only tested with Windows 10 so far, and I'm most probably missing some platform dependent constrains and checks), and only allow direct copy on DirectX surfaces (Tested with DirectX 9 so far as this is what we need, but I'm pretty sure it would work the same with DirectX 10 or 11... no idea about DX 12). So for those interested in a linux support, this is not good enough yet (but might maybe be used as an inspiration source ?).

    For those interested in copying on a GL surface on Windows, then this might still help: because one can use the DirectX surface "layer" to bridge the gap between a given project/software process and the CEF GPU service process (in fact I'm not sure this could be done another way with GLES/OpenGL [except if you run the GPU service directly in the browser process ?]). And then maybe more easily share the DirectX surface with an OpenGL context from a single process (ie. your software process) ?

    Basically, the idea I'm using here has been around for a while, but as far as I know no one actually tried to implement it (?). It is simply using the fact that CEF is using ANGLE on Windows to convert the GLES layer to a DirectX layer. And ANGLE supports interop with DirectX out of the box (cf. for instance https://github.com/Microsoft/angle/wiki/Interop-with-other-DirectX-code). So from there:

    • In my software process I create a shared DirectX 9 surface,
    • Then I updated the CEF RenderHandler class with additional methods to be able to specify if the render handler should use a provided shared DirectX surface handle instead of trying to do the rendering with the "regular path".
    • if a shared handle is provided to the RenderHandler, then I updated the CefCopyFrameGenerator behavior (used in the CefRenderWidgetHostViewOSR) to simply bypass the "regular path" in that case (ie. will not try to create a SkBitmap and then make a query on the service process to read the pixels from the compositor output surface, OnPaint will never get called, etc), and instead:
    • I created a new "GL like" command that the CefCopyFrameGenerator object will call (from the GLHelper) to request a copy of the compositor texture onto the provided shared handle.
    • Then this goes on the Command Buffer, and reaches the GLES2Decoder implementation:
    • From there I perform a one time init operation, where I create a new EGL context (sharing resources with the regular decoder context), I init a Pbuffer from the received shared handle, and prepare some additional GL ES resources (shader program, vertex buffer, etc)
    • Then each time this function is called, I take the mapping of the client texture id to get the service_id, use my own context and setup all my resources to render a screen aligned quad on my pbuffer using the service texture id.

    -> And that's it! No need for additional synchronization, no need to send anything back to the client from the service process, the DirectX surface will continuously get the updated compositor surface with a simple quad rendering, and we don't have to copy anything on the CPU memory! :-)

    Hmmm, and now that I think about, there are also other minor limitations I should mention:

    • Currently, the size of the DirectX surface/pbuffer used in the GPU service process is hardcoded to 1920x1080 => we would probably need to pass this information from the client for proper init of the pbuffer.
    • And also, I'm not bothering with releasing the EGL resources I create in my init call, ooops ;-): we might need an additional custom command to do that... but that's currently not a priority for me.
    • And in fact, I completely disabled the regular Skia Bitmap copying path in my code.

    Anyway, I'm planning to spend a few more days cleaning/validating the current code, and then I will write an article about this work and post some initial version of the updated files here so that the community can have a look and see if this can be of any help to you. For now, I just wanted to post a "little" teaser to get you excited (hopefully) ;-)!

  10. Mikael Hermansson

    @Emmanuel ROCHE That's fantastic news! I've recently been looking into doing something similar (with only Windows and Direct3D in mind), but got stuck when trying to grasp the full pipeline of the ANGLE textures to CefCopyFrameGenerator.

    but as far as I know no one actually tried to implement it (?)

    There was a patch released back in 2014 that did something similar. We (where I worked at the time) made use of it back then and we ended up shipping with it eventually. We did have issues with the GPU blacklist inside of Chromium messing things up for certain users, resulting in a completely black texture for them. Being able to fall back on the software implementation is very useful in those kinds of scenarios, even if it's just an explicit flag that the user has to pass in to deactivate the hardware accelerated path.

    As mentioned in that thread, Chromium replaced their compositor shortly thereafter, so you were pretty much stuck with that revision of CEF if you wanted hardware acceleration.

    Anyway, I'd love to get my hands on a some patch/diff files to try this out myself and give feedback, even with all the rough edges. Ideally it would be nice to have a GitHub or Bitbucket fork of CEF specifically for this, until it can get merged into CEF itself (if that's even possible with only Direct3D support).

  11. Emmanuel ROCHE

    Hi everyone,

    As mentioned in my previous note above, I can now provide some initial files you could use on your side to try to move forward with this issue

    I've stored those files on the following minimal github repo: https://github.com/roche-emmanuel/cef_direct3d_offscreen_rendering

    Also I have a blog article describing the basics of how this patch is supposed to work: http://wiki.nervtech.org/doku.php?id=blog:2017:1130_cef_direct_copy_to_d3d

    And if you have any question, you can still mention me here or reach me in any other way. On my side I keep testing and polishing to see where this will lead me...

  12. Mikael Hermansson

    @Emmanuel ROCHE Awesome, I'll be trying out your changes in the coming days. I'll let you know how it goes.

    Also, I'm not a lawyer or any sort of expert on open-source licensing, but you might want to look into bundling a LICENSE file with an appropriate license in your GitHub repo. Both for the sake of your own original code (shell file and whatnot) and the original CEF/Chromium code, since their copyright notice explicitly states that there should be one.

    If you don't assign a license of some form there might be legal issues with making use of your code in production. Somebody more well-versed in licensing might be able to shed some light on this though.

  13. Emmanuel ROCHE

    Hi @Mikael Hermansson

    Thanks for the tip on the licensing question... You're right this is something I should add on the repo: but I just have no idea yet what content I should put in there :-) I'll be looking around.

    (But of course I want to share this with everyone, for any kind of usage and/or modification, etc)

    And yes please, let me know how this works for you! I already noticed something wrong with this system when used in my own production software: currently, this is generating a lot of rendering passes each second (for the content I'm rendering at least), and in my client process I'm also using a regular "on-screen" CEF window with some other content: this on-screen window will then eventually report a "WebGL context lost" error everything will freeze for a second or tow (looks like a GPU service crash ?)... I'm investigating this.

  14. Marshall Greenblatt reporter

    @Emmanuel ROCHE Thanks for working on this and sharing your findings. Would you mind submitting your CEF/Chromium changes as a PR against 3163 branch? That would make it easier to view, test and comment on your changes. The Chromium changes can be applied using a patch file (see cef/patch/README.txt for details). General PR creation docs are here: https://bitbucket.org/chromiumembedded/cef/wiki/ContributingWithGit.md#markdown-header-working-with-pull-requests

  15. Marshall Greenblatt reporter

    @Emmanuel ROCHE In cases where we're adding large amounts of new code in Chromium (e.g. the GLES2DecoderImpl::HandleNervCopyTextureToSharedHandle implementation) we should use the buildflag/feature capability instead of including that code directly in Chromium patch files. For example, add a new source file in CEF that provides the helper implementation, include that file from the "gles2_sources" target in gpu/command_buffer/service/BUILD.gn, and call that helper implementation from a minimal GLES2DecoderImpl::HandleNervCopyTextureToSharedHandle method implementation. For more info on this approach see the documentation at https://bitbucket.org/chromiumembedded/cef/src/master/libcef/features/BUILD.gn.

  16. Mark Petersen

    Very interested in these findings as i am sitting in somewhat of the same situation and want it to run as an OpenGL surface. So thank you very much for all your work and i am hoping you could, as Marshall Greenblatt stated, and make a PR for this so others more easily can test and contribute to it :)

  17. Ole Dittmann

    @emmanuel ROCHE: Just wanted to say that I am also very interested in this and your work is very much appreciated. We have exacly the same situation. A Direct3D Application where we would like to integrate CEF. But we experienced bad performance probably because of windowless rendering and (theoretically superfluous) copying of frame data via system memory and backwards. My latest test was with cef 3.3239 and I noticed a substancial performance improvement compared to previous tests with version 3.2526. Cef seems to use hardware compositing now also in windowless mode. But especially in higher resolutions (like 4k) performance is still bad compared to normal windowed mode. As far as I see cef always copies its frame data completely into a system memory bitmap before calling "OnPaint" even if only a small area of the frame is updated it makes no difference. Also it seems that it does not properly re-use memory frames (you get many new addresses), which might prevent automatic optimization by graphics hardware for faster copying. So I think your approach with the shared texture might give a HUGE performance boost for our case. And we would very much like to see it integrated into cef some day!

  18. Isaac Richards

    This is still very WIP, but the attached diff iterates over the change @Emmanuel ROCHE posted with the following:

    • The new API for requesting the shared textures is extended to pass along similar information to the OnPaint callback, so multiple browsers/popups can be supported.

    • Added a 'done' callback w/ dirty rect so the cef client knows when copying to the destination surface has completed.

    • Reworked the chromium changes such that it's now just a pair of small extensions to map/unmap a shared handle to an ANGLE texture, so the existing texture copy code in GLHelper can be used (might switch to glCopyTextureCHROMIUM)

    • Removes the 1080p hardcoded size and other bugfixes.

    • Incorporates the osr/readback perf patch for better 60fps timing.

    • Needs to use DX11 keyed mutex shared textures. I was unable to get proper synchronization of the texture copies between the processes without this. Everything mostly works without the keyed mutex, but the proper data isn't rendered unless you wait a short period of time after everything theoretically should be done copying. This is only a noticeable problem for isolated frame updates, not continuous animations.


    • Update to a more current branch of CEF.

    • Look into mapping the texture earlier to see if the CopyOutputRequest can go directly to the destination texture.

    • Get the right #ifdefs in so it'll not break compilation outside of windows.

    • Provide sample client code.

    A proper PR is forthcoming, but I'll likely update everything to a newer branch first.

  19. Emmanuel ROCHE

    Hello @Marshall Greenblatt and all,

    Glad to see there is some interest on this issue :-) and I take note that you all expect a PR submitttion as well as a minimal sample on how to use this: I should be able to provide these in the coming weeks, but first I need to focus on the memory leak issue I already discussed with @Mikael Hermansson : end of last year was a bit too crazy so didn't get a chance to investigate that properly unfortunately...

    Also, the implementation provided by @Isaac Richards above might work just fine for you, but it's significantly different from what I have on my side so I'll keep working on my own implementation for now, as it fits my needs more precisely and I still have the feeling I can improve on it a bit further. I will keep you informed soon.

  20. Isaac Richards

    @Emmanuel ROCHE - it's actually really similar, I just cut out the gl drawing code you reimplemented (it already exists both in gl_helper and the glCopyTextureCHROMIUM extension, I figured we should try to keep this change as small as we can in chromium), added some extra bits to map the shared handle to a texture so that that existing code could be used, and moved most of the logic to a new file.

    You can actually make my change behave functionally equivalent to yours (ie, no frame level synchronization for the paint updates) by passing in a single dx9 shared texture handle or a dx10/11 shared handle (the non-keyed mutex version) and just ignoring the 'done' callbacks at the client - it'll all work fine as long as you keep rendering frames and don't do single frame updates. The additional callback api is needed, however, if the client wants to handle individual paints and needs to know when the texture has actually updated with the proper data.

  21. Isaac Richards

    In case anyone is interested, I posted pull request #144 a few days ago against CEF 3239. See https://github.com/Microsoft/angle/wiki/Interop-with-other-DirectX-code for details on how to create a texture & shared handle for using the new API. It's recommended to use D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX, however - client code would be the same aside from the calls to IDXGIKeyedMutex::AcquireSync and ReleaseSync around rendering the texture. The chromium side code will attempt to lock and unlock the mutex with a key of '0' if there is a keyed mutex associated with the shared handle, otherwise it ignores it and uses the shared handle without synchronization.

  22. wesselsga

    @Isaac Richards - I'm very interested in your pull request and have been trying to get a build of 3239 w/ your changes. I can successfully build 3239 for x64 on Win 10 - however the rendering results of any page are not correct. (using cefclient.exe - the page is garbage basically). I have a pending forum topic on the question and I get the rendering issues with or without your pull request integrated.

    Any chance you can share your binary distribution for testing? I already have a client application setup using your API changes for shared textures. But since I don't have a valid build - the textures always seem to be transparent (nothing rendered). Turning the gpu off for OSR and bypassing the shared texture path - I get the same invalid rendering results in my application that I see when using cefclient.exe in direct rendering mode.

    Excellent work to both you and @Emmanuel ROCHE for putting these changes together!

  23. wesselsga

    @Isaac Richards you can disregard the request for your binary distribution. We were able to get a build put together and have been testing the shared texture rendering with very promising results so far. Thanks again for this pull request - hopefully it will get approved and merged into master.

  24. Mark Petersen

    @Isaac Richards @wesselsga Thank you both so much for your contributions to this. We have also been building a test of this and have yet to test it. I was though wandering if any of you have tested this with WebGL content? and if yes then what sort of performance have you seen? Cause that is what we need this for and our problem is that we need this to run on pretty outdated hardware.

  25. Marcin

    @wesselsga, Could you please share the binary distribution for testing ? I don't have VS2017 yet and cannot build cef myself. I'd like to test how it improves the performance. I already have DX11 code with the texture created based on the shared bits. I render using deferred context and wonder how this would improve with the shared texture.

  26. wesselsga

    Here is another associated pull request similar to the one from @Isaac Richards. The main difference is utilizing the OffscreenBrowserCompositorOutputSurface already in Chromium to render directly to a shared d3d11 texture rather than issuing a frame copy from the default GL frame buffer.

  27. Marshall Greenblatt reporter

    Support for Windows (D3D11) added in master revisions 713eebc and 1e6b870.

    This issue will be left open as the tracking bug for adding Linux and macOS support. Please create new issues for any new/remaining problems related to the Windows implementation.

  28. Log in to comment