I have finally got back to this and finished it. I have removed basically everything that’s not related to morph animation since I’ve got other things going on in my v2-1 fork so I think this is a pretty clean PR. I have implemented it in Metal, GLSL and D3D11 and have tested in macOS, Ubuntu and Windows (not in a VM). You can use the Sample_MorphAnimations for testing.
Pose animation data is put into the worldMatBuf. If the mesh has skeletal animation, the pose animation data comes right after the last bone matrix, otherwise it starts at the offset given in the first 23 bits of worldMaterialIdx[drawId].x. The first vec4 holds the number of vertices and base vertex id, then comes the pose weights in the next vec4’s, then if the mesh doesn’t also have skeletal animation, the worldMat and worldView matrices come right after.
The vertex offsets are stored in a single static texbuffer. Half precision is used by default.
Check extra notes on the Diff below.
Damn it looks good. I will have to take a look and test it before merging it. Ping me periodically if I forget (I am doing so many things.... :( )
I was hoping you would go the compute approach, but perhaps that is still salvageable by using the "just one pose path".
The main problem with vertex-shader-only approach, is that if you have 1 Million vertices and the pose modifies just one vertex, then this approach still needs to perform 1 Million operations (and waste a lot of VRAM). If you want 8 poses simultaneously, that's 8 Million ops (8 per vertex) regardless of how many vertices actually need to be touched.
A compute shader can more intelligently perform fewer operations. Looking at your code I think this approach is still feasable: When (e.g.) running 8 poses simultaneously, a compute shader can process those 8 poses and generate a pose buffer with 1 million entries (most of them already default initialized to 0) for the vertex shader to consume.
The vertex shader would run and perform 1 million operation, as if there was only one pose running (by grabbing the pose buffer generated by the CS).
I chose not to go in the compute shader path because it looked too complicated to me. It’d take more time and sanity to get it working I guess because I believed it would require messing around with the Ogre internals even more.
If you could give me some extra guidance in the compute shader approach I might be able to pull that one off.
I was talking out loud (and also as I reminder for myself that this work is not incompatible with a compute shader based approach, which I initially thought it was).
Definitely starting with what you know is best. I am currently more interested in porting this code to Ogre 2.2 branch. The shader files have changed a lot but actually haven't really changed much. What happened is that we were writing 3x files (one for each backend: GLSL, HLSL, Metal) and we merged all of that work into a single file, with minor divergences in the shading language abstracted via macros, and major divergences in separate files.
You have experienced that yourself already: You wrote 3 implemetations that were near identical save for a few minor differences.
Basically porting to Ogre 2.2 is mostly a cut-and-paste job (but has to be done by hand because in Mercurial's eyes, the files have "changed significantly").
As for compute, I've been doing a lot of compute lately due to VCT (Voxel Cone Tracing) and as a result several bugfixes and enhancements have been made to compute in 2.2, which is why I think you did well in not rushing to a Compute approach.
If you're up to it, I could guide you to writing a compute shader implementation. But first I need to review this PR :D
I was already thinking of adapting this to 2.2 so I think it would be better to merge this into 2.1 the way it is and implement the compute approach in 2.2 but I need to familiarize myself with 2.2 first.
Now it would be nice if it can be merged to 2.2
I could try to do it myself as it doesn't look like it's much work, but I think it would be good if you get familiar with it.
The idea would be centralize your Pose code into an any file (look for all the files in Samples/Media/Hlms/Pbs/Any) so that you don't have 3 almost exact copies of each.
Additionally using "@insertpiece( input_vertex )" feels a little overkill now in 2.2; as we're using macros.
I would love to look into implementing this in 2.2 using compute shaders. I guess the idea is to have an index buffer and an offsets buffer where the index buffer contains the indices of vertices that are changed by that pose and the offsets buffer contains the corresponding offsets (for position, normal and hopefully tangents as well) and then this shader takes these plus the vertex buffer as inputs and outputs a transformed vertex buffer. Then I’m just not sure how this output buffer will be used as the vertex buffer for the draw call. That’s where I’ll need some help.
Oh! First get this version working first, then we'll worry about Compute Shaders.
The compute shaders is something to add to your current code, rather than replace your current code
I believe a lot of things will be different. All the morph animation code will be removed from the HlmsPbs shaders and will be put into the compute shader but it’s going to be different. Also no morph data will be passed via the worldMatBuf. I believe a lot of things will have to change. Anyways, I will create a topic in the forums when I start to look into this.