Wiki
Clone wikiML-ImageSynthesis / Implementation
How does it work
Output images
- Image segmentation - color encoded 'InstanceID', unique object's identifier
- Object categorization - color encoded object's Layer (or optionally Tag)
- Optical flow - based on Unity's per-pixel Motion Vectors, but with colors encoded to fit into pair of unsigned 8bit channels of the .PNG image
- Depth - based on per-pixel distance to camera, but encoded to better fit into 8-bit channels of the .PNG image
- Normals - based on surface orientation in relation to camera
- ... and more in the future
Implementation details
First of all ImageSynthesis.OnSceneChange()
calls ColorEncoding
class to encode unique object idenitifier and layer as RGB color. These colors are stored in MaterialPropertyBlock
for each object and are automatically passed into the shaders when rendering.
Upon start ImageSynthesis
component creates hidden camera for every single pass of output data (image segmentation, optical flow, depth, etc). These cameras allow to override usual rendering of the scene and instead use custom shaders to generate the output. These cameras are attached to different displays using Camera.targetDisplay
property - handy for preview in the Editor.
For Image segmentation and Object categorization pass special replacement shader is set with Camera.SetReplacementShader()
. It overrides shaders that would be otherwise used for rendering and instead outputs encoded object id or layer.
Optical flow and Depth pass cameras request additional data to be rendered with DepthTextureMode.Depth
and DepthTextureMode.MotionVectors
flags. Rendering of these cameras is followed by drawing full screen quad CommandBuffer.Blit()
with custom shaders that convert 24/16bit-per-channel data into the 8-bit RGB encoding.
Finally images are readback with Texture.ReadPixels()
from GPU, compressed with Texture.EncodePNG()
to PNG format and stored on disk.
Related Unity documentation
For more information see:
Updated