Wiki

Clone wiki

ML-ImageSynthesis / Implementation

How does it work

Output images

  • Image segmentation - color encoded 'InstanceID', unique object's identifier
  • Object categorization - color encoded object's Layer (or optionally Tag)
  • Optical flow - based on Unity's per-pixel Motion Vectors, but with colors encoded to fit into pair of unsigned 8bit channels of the .PNG image
  • Depth - based on per-pixel distance to camera, but encoded to better fit into 8-bit channels of the .PNG image
  • Normals - based on surface orientation in relation to camera
  • ... and more in the future

Implementation details

First of all ImageSynthesis.OnSceneChange() calls ColorEncoding class to encode unique object idenitifier and layer as RGB color. These colors are stored in MaterialPropertyBlock for each object and are automatically passed into the shaders when rendering.

Upon start ImageSynthesis component creates hidden camera for every single pass of output data (image segmentation, optical flow, depth, etc). These cameras allow to override usual rendering of the scene and instead use custom shaders to generate the output. These cameras are attached to different displays using Camera.targetDisplay property - handy for preview in the Editor.

For Image segmentation and Object categorization pass special replacement shader is set with Camera.SetReplacementShader(). It overrides shaders that would be otherwise used for rendering and instead outputs encoded object id or layer.

Optical flow and Depth pass cameras request additional data to be rendered with DepthTextureMode.Depth and DepthTextureMode.MotionVectors flags. Rendering of these cameras is followed by drawing full screen quad CommandBuffer.Blit() with custom shaders that convert 24/16bit-per-channel data into the 8-bit RGB encoding.

Finally images are readback with Texture.ReadPixels() from GPU, compressed with Texture.EncodePNG() to PNG format and stored on disk.

For more information see:

Updated