EGL support, headless rendering

Create issue
Issue #219 resolved
Former user created an issue

Hello!

I've used Pyglet to implement an OpenAI Gym environment for training reinforcement learning agents. A common use case is for this to run headless on a cluster. Unfortunately, in order to run this headless, I need to use xvfb. I've run into multiple problems with this, including previously crashed X servers by other users on the cluster lingering around. Not to mention that xvfb can only run my code with software rendering, which is slow.

I would really like it if Pyglet implemented support for the EGL API, so that it can run headless with 3D acceleration, and without needing an X server or xvfb.

Comments (32)

  1. Benjamin Moran

    EGL support is interesting, and would be nice to have. This would also be nice for potentially supporting Wayland as a backend. No one is working on this at the moment. If someone wants to give this a try, I'm happy to provide support.

  2. Florian Golemo

    I agree with the author of this ticket: This is super important for the machine learning community. I don't have the necessary skills to pull this off, but I'd buy a coffee for anyone who did. Headless rendering is a must-have in our field and to my best knowledge, there aren't any good alternatives out there.

  3. Benjamin Moran

    I got the ball rolling by generating egl bindings using pyglet's ctypes generation tools. I don't have time to work on this at the moment myself, unfortunately.

    @fgolemo, out of curiosity, what distribution do ML folks tend to use?

  4. Florian Golemo

    @HigashiNoKaze thanks so much already. :)

    For your question about distributions: in which context?

  5. Benjamin Moran

    I just meant which Linux distribution. I was wondering what the EGL support looks like, but I suppose I was only thinking of the Mesa stack. If you're using Nvidia cards, then you'll have EGL 1.5 support there. Recent Mesa also supports EGL 1.5, so it looks reasonable to target 1.5.

  6. Florian Golemo

    Ah. 😆 I was honestly thinking "Gaussian distribution, uniform distribution,... ?".

    We're mostly running Debian and Ubuntu, I'd say. I've seen some sparse CentOS too.

    And yeah, the vast majority of machine learning happens on NVIDIA cards.

  7. Benjamin Moran

    That makes sense. Thank you both for the information.

    It looks like this is a good article describing the creation of an OpenGL context on EGL: https://devblogs.nvidia.com/egl-eye-opengl-visualization-without-x-server/

    Creating a context is the first step. Contexts and Windows usually go hand in hand, so a dummy Window class would also be necessary (but should be simple to write). There wouldn't be any input from mouse/keyboard, but I guess that goes without saying when you're running headless.

  8. Benjamin Moran

    I had some time, so I poked around at this a bit. I used the bindings I generated earlier, and followed along with this page: https://devblogs.nvidia.com/egl-eye-opengl-visualization-without-x-server/

    I was able to successfully create a display, surface, and standard GL context (as far as I can tell at least). The code seems to pass correctly, at least on the Intel machine I tried it on. If anyone wants to try it, give the following code a try. It doesn't do anything yet except for print out the number of available configs.

    import egl
    from egl import *
    
    
    # Initialize a display:
    display = egl.EGLNativeDisplayType()
    display_connection = egl.eglGetDisplay(display)
    result = egl.eglInitialize(display_connection, None, None)
    assert result == 1, "EGL Initialization Failed"
    
    # Get the number of configs:
    num_configs = egl.EGLint()
    config_size = egl.EGLint()
    egl.eglGetConfigs(display_connection, None, config_size, num_configs)
    assert result == 1, "Failed to query Configs"
    
    print("Number of configs available: ", num_configs.value)
    
    # Choose a config:
    config_attribs = (EGL_SURFACE_TYPE, EGL_PBUFFER_BIT,
                      EGL_BLUE_SIZE, 8,
                      EGL_GREEN_SIZE, 8,
                      EGL_RED_SIZE, 8,
                      EGL_DEPTH_SIZE, 8,
                      EGL_RENDERABLE_TYPE, EGL_OPENGL_BIT,
                      EGL_NONE)
    config_attrib_array = (egl.EGLint * len(config_attribs))(*config_attribs)
    egl_config = egl.EGLConfig()
    egl.eglChooseConfig(display_connection, config_attrib_array, egl_config, 1, num_configs)
    
    # Create a surface:
    pbufferwidth = 9
    pbufferheight = 9
    pbuffer_attribs = (EGL_WIDTH, pbufferwidth, EGL_HEIGHT, pbufferheight, EGL_NONE)
    pbuffer_attrib_array = (egl.EGLint * len(pbuffer_attribs))(*pbuffer_attribs)
    surface = egl.eglCreatePbufferSurface(display_connection, egl_config, pbuffer_attrib_array)
    
    # Bind the API:
    egl.eglBindAPI(egl.EGL_OPENGL_API)
    assert result == 1, "Failed to bind EGL_OPENGL_API"
    
    # Create a context:
    context = egl.eglCreateContext(display_connection, egl_config, None, None)
    
    # Make context current:
    egl.eglMakeCurrent(display_connection, surface, surface, context)
    assert result == 1, "Failed to make display current"
    
    # Terminate EGL:
    egl.eglTerminate(display_connection)
    
  9. Maxime Chevalier-Boisvert

    Thanks for doing this Benjamin!

    It works on my machine. Ubuntu 18.04 laptop with an nvidia GeForce 940MX GPU: Number of configs available: 70

    What's the next step?

  10. Benjamin Moran

    Great! If you can try again with the following changes to the "Initialize a Display" section, we can confirm if your setup supports EGL 1.5 or not. It turns out my Intel integrated graphics machine only supports EGL 1.4. I will test it on my AMD machines later. I'm not sure if the versions matter yet.

    # Initialize a display:
    display = egl.EGLNativeDisplayType()
    display_connection = egl.eglGetDisplay(display)
    
    majorver = egl.EGLint()
    minorver = egl.EGLint()
    result = egl.eglInitialize(display_connection, majorver, minorver)
    assert result == 1, "EGL Initialization Failed"
    egl_version = majorver.value, minorver.value
    print(f"EGL version: {egl_version}")
    

    For a next step, I'm going to stub out some pyglet Config/Canvas/Window classes. The platform is automatically detected when using pyglet, so a way to request a headless mode is necessary. Maybe a pyglet.options['headless'] = True would do the trick, and use the EGL codepaths.

    By the way, what is the general headless use case? As in, how are you getting data back from your simulations? Are there textures that are created and then saved as images, or something different?

  11. Maxime Chevalier-Boisvert

    EGL version is 1.4 here too

    Right now I'm allocating frame buffers, one multisampled frame buffer to render into, and a second frame buffer that the first one gets resolved into. Then I read the pixels out into a numpy array using glReadPixels. I also read out the depth map from the Z buffer using glReadPixels.

    I'm currently working on this simulator: https://github.com/maximecb/gym-miniworld The frame buffer readout happens here: https://github.com/maximecb/gym-miniworld/blob/master/gym_miniworld/opengl.py#L277

  12. Christopher Hesse

    This would be great to have for some of our OpenAI gym environments! We have a few that use pyglet for rendering, and it's a pain to configure+run an X server for this purpose.

  13. Maxime Chevalier-Boisvert

    You are a hero Benjamin. Let me know if you need help testing a beta version of this.

  14. Matthew Matl

    This is awesome, happy to help with this in any way I can. I was implementing a pyglet-backed EGL platform for offscreen rendering here: https://github.com/mmatl/pyrender/blob/d7c61a86b90311e3dcc1b043e6a27470093aa474/pyrender/platforms.py#L80. I'm able to get a modern OpenGL context, but my shaders (GLSL 3.3) won't compile and I can't get any error messages out of glGetProgramInfoLog, for example. Going to keep plugging away and try to figure out why EGL doesn't seem to want to compile modern shaders!

  15. Benjamin Moran

    Thats for the information everyone. Matthew, I'll have a look through the links. Those should help a lot. I don't have a lot of time to work on this right now, but I'll see what I can come up with. It shouldn't be too difficult, but just have to find the time to write the code.

  16. Benjamin Moran

    Sorry for the lack of reply. I've been a bit busy, but I plan to work on this in the near future when I have some time.

    Mainly, I will just need some testing once I have a little bit more fleshed out. I went through and stubbed out all of the appropriate classes already, but there is a bit more work to go.

  17. Benjamin Moran

    Ha! I’ve actually run into a bit of an issue with creating the “display”. I have only access to AMD/Intel hardware, where GBM is used for the backing buffers. Nvidia uses PBuffers in their examples (eglCreatePbufferSurface), which do not work on Linux with AMD/Intel. If someone has some time, it would be useful to confirm if PBuffers do indeed work on Nvidia. Mainly my same example above, but as you might have noticed the “result” variable is not actually catching any return values except for at the top, and therefore none of the assertion checks are doing anything 🙂

    A next step for me would be to figure out how to create GBM buffers properly, and then continue at least for AMD/Intel hardware for the time being, and revisit Nvidia after I get that working.

  18. Maxime Chevalier-Boisvert

    The typical use case of training reinforcement learning agents relies on nvidia hardware pretty much exclusively, so nvidia support is pretty important. Do you have a piece of code I can run to test PBuffers on nvidia?

    Would it be helpful to you if we could mail you a PCI-E nvidia GPU for testing? Probably not state of the art, but I’ll try to find something and mail at my own expense if it helps 🙂

  19. Benjamin Moran

    I do appreciate the offer, and apologize for my slow response on this. I can probably pick up a cheap Nvidia card locally, but I need to find the time to throw a spare PC together.

    Interestingly, however, my Intel based notebook somehow suddenly seems to create valid contexts with Pbuffers. Maybe it was a recent driver update, since I’m on a rolling distro. In any case, I can poke at this a bit more now. If you can, please give this code a test on your Nvidia machines as well. It should print out a few simple stats:

    import egl
    from egl import *
    
    
    _buffer_types = {EGL_SINGLE_BUFFER: "EGL_RENDER_BUFFER",
                     EGL_BACK_BUFFER: "EGL_BACK_BUFFER",
                     EGL_NONE: "EGL_NONE"}
    
    _api_types = {EGL_OPENGL_API: "EGL_OPENGL_API",
                  EGL_OPENGL_ES_API: "EGL_OPENGL_ES_API",
                  EGL_NONE: "EGL_NONE"}
    
    # Initialize a display:
    display = egl.EGLNativeDisplayType()
    display_connection = egl.eglGetDisplay(display)
    
    majorver = egl.EGLint()
    minorver = egl.EGLint()
    result = egl.eglInitialize(display_connection, majorver, minorver)
    assert result == 1, "EGL Initialization Failed"
    egl_version = majorver.value, minorver.value
    print(f"EGL version: {egl_version}")
    
    # Get the number of configs:
    num_configs = egl.EGLint()
    config_size = egl.EGLint()
    result = egl.eglGetConfigs(display_connection, None, config_size, num_configs)
    assert result == 1, "Failed to query Configs"
    
    print("Number of configs available: ", num_configs.value)
    
    # Choose a config:
    config_attribs = (EGL_SURFACE_TYPE, EGL_PBUFFER_BIT,
                      EGL_BLUE_SIZE, 8,
                      EGL_GREEN_SIZE, 8,
                      EGL_RED_SIZE, 8,
                      EGL_DEPTH_SIZE, 8,
                      EGL_RENDERABLE_TYPE, EGL_OPENGL_BIT,
                      EGL_NONE)
    config_attrib_array = (egl.EGLint * len(config_attribs))(*config_attribs)
    egl_config = egl.EGLConfig()
    egl.eglChooseConfig(display_connection, config_attrib_array, egl_config, 1, num_configs)
    
    # Create a surface:
    pbufferwidth = 1
    pbufferheight = 1
    pbuffer_attribs = (EGL_WIDTH, pbufferwidth, EGL_HEIGHT, pbufferheight, EGL_NONE)
    pbuffer_attrib_array = (egl.EGLint * len(pbuffer_attribs))(*pbuffer_attribs)
    surface = egl.eglCreatePbufferSurface(display_connection, egl_config, pbuffer_attrib_array)
    
    # Bind the API:
    result = egl.eglBindAPI(egl.EGL_OPENGL_API)
    assert result == 1, "Failed to bind EGL_OPENGL_API"
    
    # Create a context:
    context = egl.eglCreateContext(display_connection, egl_config, None, None)
    
    # Make context current:
    result = egl.eglMakeCurrent(display_connection, surface, surface, context)
    assert result == 1, "Failed to make context current"
    
    error_code = egl.eglGetError()
    assert error_code == EGL_SUCCESS, "EGL Error code {} returned".format(error_code)
    
    # Print some context details:
    buffer_type = egl.EGLint()
    egl.eglQueryContext(display_connection, context, EGL_RENDER_BUFFER, buffer_type)
    print("Buffer type: ", _buffer_types.get(buffer_type.value, "Unknown"))
    print("API type: ", _api_types.get(egl.eglQueryAPI(), "Unknown"))
    
    # Terminate EGL:
    egl.eglTerminate(display_connection)
    
  20. Maxime Chevalier-Boisvert

    I ran the above on an nvidia card and got:

    EGL version: (1, 4)
    Number of configs available:  70
    Buffer type:  EGL_BACK_BUFFER
    API type:  EGL_OPENGL_API
    
  21. Benjamin Moran

    Great! Thanks for the confirmation. It seems to be OK so far. I’ll keep plugging away on it and see how far I can get.

  22. Zack Polizzi

    Agree that this would be super valuable for the OpenAI Gym headless use case (Ubuntu 16.04, Nvidia V100 GPU is my goto).

    Here’s what I get when I run the above script:

    % python egl_test.py                                                                                                                                               1 ↵
    EGL version: (1, 5)
    Number of configs available:  65
    Buffer type:  EGL_BACK_BUFFER
    API type:  EGL_OPENGL_API
    
  23. Danijar Hafner

    I would love to see this feature as well. There are a couple of nice Gym environments that render via pyglet but we can’t use them because they don’t support headless rendering.

  24. Benjamin Moran

    Hi everyone,

    Sorry for the lack of updates. As some of you might know, development of pyglet has migrated to Github, and Git. There is a new issue over there tracking EGL support:

    https://github.com/pyglet/pyglet/issues/51

    Bitbucket also announced they will be closing all Mercurial (HG) based repositories next year, so I’ll be closing most issues here before then. For now I’ll leave this open for comments.

    Thanks!

  25. Log in to comment