Deteriorated stability of simulations.

Issue #51 resolved

Bram Stolk created an issue 2018-10-21

Non trivial OpenDE simulations have become unstable a few years back, when constraint-reordering was changed.

Random constraint reordering used to reorder all constraints. Now, it does it in two batches that do not intermingle.

demo_feedback and demo_convex can be used to observe the difference.

Here is a video of first the new, and then the old behaviour of OpenDE. Notice how the old simulation is stable, whereas the new one is not. https://youtu.be/ifDLqCP9_nw

USENET thread describing this issue can be found here: https://groups.google.com/forum/#!search/Random$20constraint$20reordering/ode-users/vVUcf-JHX6M/QcmhDR3WESgJ

At the very least, OpenDE should add a ./configure flag to revert back to the old style reordering system, so that old behaviour can be reinstated.

Or, alternatively, always reorder fully, skipping the need for a new flag.

Comments (27)

Oleh Derevenko Account Deactivated
Bram, I have committed a fix to QuickStep. Please try it to see if it improves things for you. The problem was not in the separate constraint randomization. Rather the problem was that originally the constraints were mostly solved in their natural order as generated and only each 8th iteration was randomized. With my changes in #1898 the constraints started to be randomized but never reverted back to the normal order on subsequent iterations. And that, obviously, broke the algorithm. Also, an interesting thing is that in the original implementation, during dependent constraints separation, just due to implementation convenience, those were stored from the tail to the array middle (in reverse order). Somehow, that reversed dependent constraint order is important. If you place them in straight order the simulation becomes worse.
- 2018-11-05T12:25:27+00:00
Bram Stolk reporter
Good catch Oleh,

Unfortunately, it is only a marginal improvement.

0.13 and 0.12 are still vastly better than the current branch, with latest fix.
- 2018-11-05T17:40:37+00:00
Bram Stolk reporter
https://youtu.be/uJltd1pDBSs

Whereas this is rock solid on old ODE's.
- 2018-11-05T17:56:09+00:00

Oleh Derevenko Account Deactivated

Can you provide something simpler I could see difference with? Demo_convex seems to be fixed with the latest changes. If you want to try the complete re-shuffling you can easily have it by replacing:

            if (ThrsafeExchange(&stage4CallContext->m_SOR_reorderHeadTaken, 1) == 0) {
                // Process the head
                const dxQuickStepperLocalContext *localContext = stage4CallContext->m_localContext;
                ConstraintsReorderingHelper()(stage4CallContext, 0, localContext->m_m - localContext->m_valid_findices);
            }

            if (ThrsafeExchange(&stage4CallContext->m_SOR_reorderTailTaken, 1) == 0) {
                // Process the tail
                const dxQuickStepperLocalContext *localContext = stage4CallContext->m_localContext;
                ConstraintsReorderingHelper()(stage4CallContext, localContext->m_m - localContext->m_valid_findices, localContext->m_valid_findices);
            }

with single

            if (ThrsafeExchange(&stage4CallContext->m_SOR_reorderHeadTaken, 1) == 0) {
                // Process the head
                const dxQuickStepperLocalContext *localContext = stage4CallContext->m_localContext;
                ConstraintsReorderingHelper()(stage4CallContext, 0, localContext->m_m);
            }

in lines [2468-2478]

2018-11-05T19:17:06+00:00

Bram Stolk reporter
See my pull request for demo_feedback changes, which better show stability.

In a properly functioning solver, the bridge should not break. I'll send videos.
- 2018-11-05T20:10:24+00:00
Bram Stolk reporter
How it should work, and how it works now.

Notice the completely unnatural buckling upwards, before it breaks.

https://youtu.be/2rGVURwfgXs
- 2018-11-05T20:14:49+00:00
Bram Stolk reporter
I also tried replacing the code as suggested above: results are still not good.
- 2018-11-05T20:23:34+00:00
Oleh Derevenko Account Deactivated
I have strong impression that demo_feedback depends on CPU speed and behaves differently on slower and faster machines even if you run the same code.
- 2018-11-05T20:25:23+00:00
Oleh Derevenko Account Deactivated
Can you point a revision in the source control at which you believe demo_feedback was working?
- 2018-11-05T20:27:14+00:00
Bram Stolk reporter
Nope, fixed 2ms step, always.

Also, did you see the youtube video I linked? Day and night difference, each time I run it.

If you stack weights on a chain, it should never buckle upwards like that. Completely broken.
- 2018-11-05T20:28:37+00:00
Bram Stolk reporter
It broke with the very patch that changed the random reordering.

In major versions:

0.12 on sourceforge works. 0.13 on sourceforge is unstable, but is easily fixed by doing the full reorder.

everything else is broken.
- 2018-11-05T20:37:47+00:00
Bram Stolk reporter
This is the very patch that broke it: https://sourceforge.net/p/opende/code/1936/

And ODE has never worked properly ever since.

I'm puzzled by the fact that nobody else complained about this.
- 2018-11-05T20:40:23+00:00
Bram Stolk reporter
To reduce it ever further: even with 0 boxes on the bridge (STACKCNT=0) the bridge will explode in current ODE tree, even with full reorder change you suggest. ODE 0.12 is fine.

I've reexamined the demo_feedback code: I see no obvious error in it. It's just a bunch of masses connected together, with a slider on both ends to suspend it in-air. Why can't ODE simulate this without exploding?

Do you see the explode as well? It starts to shake, first slowly, then violently, and it explodes. I build on linux, CPU is a vanilla Haswell Intel CPU. I tried disabling MT but same result.
- 2018-11-05T21:06:22+00:00
Bram Stolk reporter
And another datapoint to consider:

demo_feedback is rock solid using dWorldStep().

This is a QuickStep specific issue.
- 2018-11-05T21:31:58+00:00
Oleh Derevenko Account Deactivated
So, I updated to 0.12 in Hg (revision #1741) and configured vs2008 project with the Premake. Then I imported that into Visual Studio 2013 and built. The bridge breaks in all the build configurations (Single/Double, x86/x64). Also I tried building that with MinGW GCC 5.3.0 (Release+Double) and the bridge breaks with that as well. The only thing I must admit, it breaks consistently in the same way across all the debug builds and consistently across x86/x64 for the release single and release double (though release single and release double differ a bit).
- 2018-11-09T00:20:00+00:00
Bram Stolk reporter
You need to set maxforce from 2000 to 6000 to avoid breakage. See my pull rq.

Or better yet: STaCKCNT to zero.... an empty bridge will break.
- 2018-11-09T00:59:25+00:00
Oleh Derevenko Account Deactivated
All right Bram, I've fixed demo_feedback but the demo_convex still has the fragment popping up.

Then changes may not work with MT operation mode yet - I still need to review it. Please use the single threaded mode for your testing at this time.
- 2018-11-12T00:51:08+00:00
Bram Stolk reporter
Thank you, Oleh, I will test it.

Be advised: the demo_convex pop is a different issue, unrelated to this. ODE 0.12 just happened to be more forgiving to the error in the convex code.

I'm pretty sure the convex-vs-convex collision is suspect, and occasionally produces contacts that are far too deep.
- 2018-11-12T02:25:35+00:00
Bram Stolk reporter
Good job. The demo_feedback is well-behaved now. I also tried my crane simulator, and that one is stable now too (first time since many years again, with current dev branch.)
- 2018-11-12T02:59:26+00:00

Oleh Derevenko Account Deactivated

Hi Bram,

I was investigating the issue with single thread working bad while re-routed into multi threaded path and I have found that with independent constraints placed at end for the first 8 iterations it works well in all the cases.

Could you try this patch with your crane simulator and tell me whether it makes any difference?

diff -r b802a0b726fa ode/src/quickstep.cpp
--- a/ode/src/quickstep.cpp Wed Nov 21 02:06:09 2018 +0200
+++ b/ode/src/quickstep.cpp Sun Dec 02 22:03:27 2018 +0200
@@ -2231,12 +2231,12 @@

     {
         // make sure constraints with findex < 0 come first.
-        IndexError *orderhead = order, *ordertail = order + (m - valid_findices);
+        IndexError *orderhead = order, *ordertail = order + (/*m - */valid_findices);
         const int *findex = localContext->m_findex;

         // Fill the array from both ends
         for (unsigned int i = 0; i != m; ++i) {
-            if (findex[i] == -1) {
+            if (findex[i] != -1) {
                 orderhead->index = i; // Place them at the front
                 ++orderhead;
             } else {
@@ -2244,7 +2244,7 @@
                 ++ordertail;
             }
         }
-        dIASSERT(orderhead == order + (m - valid_findices));
+        dIASSERT(orderhead == order + (/*m - */valid_findices));
         dIASSERT(ordertail == order + m);
     }
 }

2018-12-02T20:07:53+00:00

Oleh Derevenko Account Deactivated
@bram_stolk, in particular, I 'm also interested in multi-threaded mode if you would be so kind.
- 2018-12-04T23:48:46+00:00
Bram Stolk reporter
I don't have much time a.t.m, I tried, but my quickstep.cpp looks totally different at line 2231. I'm struggling with hg.

$ hg status gives me:

changeset: 2126:ac56b6ed3333 tag: tip user: oleh_derevenko date: Tue Dec 04 03:00:39 2018 +0200 summary: Changed: Some optimizations to dxJointContact::getInfo1/2
- 2018-12-05T01:20:45+00:00
Oleh Derevenko Account Deactivated
@bram_stolk , no problem. Take your time.

Your changeset looks correct. But if automatic merge fails you could easily make those three changes manually.
- 2018-12-05T12:15:48+00:00
Bram Stolk reporter
I don't have an MT version, but ST works well for sim, with those changes.
- 2018-12-05T20:50:02+00:00
Oleh Derevenko Account Deactivated
To have multiple solver threads you only need to frame your simulation loop with thread pool allocation and release like the demos do. It's very easy.
- 2018-12-05T20:53:58+00:00
Oleh Derevenko Account Deactivated
Never mind. I did some quantitative estimations on the solver output quality and I do not see any improvements with the reversed order. So I'll leave it as it was.
- 2018-12-09T18:00:32+00:00
Oleh Derevenko Account Deactivated
- changed status to resolved
- 2018-12-09T18:00:50+00:00
Log in to comment

Assignee: Oleh Derevenko

Type: bug

Priority: major

Status: resolved

Version: development

Votes: 0

Watchers: 2