A significant and much needed cleanup of the avatar physics code. It resolves numerous issue with and greatly simplifies the physx character controller implementation.
While there is still room for improvement, this is a good starting point.
Avatars now jump consistently, regardless of cpu speed.
Fixes unexpected camera transitions with entering/exiting some subworlds. (Teledahn elevator)
Fixes a jitter in animated subworlds. (Er'cana harvester)
Dynamic controllers are correctly positioned on the ground. (Jalak)
Much improved interaction with dynamic physicals.
Other physics related issues may or may not be addressed.
This sounds great, both for the in-game performance reasons and the code-cleanup. I'll definitely try to pull this and look through it ASAP!
Explored several ages, kicked around some kickables and clicked some clickables. Jumped and ran a bit. No regressions found.
I also tested the subworld motion in Er'cana and Gahreesen. It's now //very// smooth, as it should be!
Tested the foundrybuild and a H'uru port. Both offer noticeable improvements in the simulation. Inclusion is highly recommended.
I can’t review the code, but I have tried it and found big improvements in many areas. Kickables are more kickable (even Kadish leaves!). Er’cana harvester ride is rock solid. Standing jump is consistent. On PhysX 2.6.4, camera regions now work while climbing ladders, and stairs with mismatched collision ramps are easier to climb. Good job!
It breaks combo-jumps for me – may or may not do that for others. This is a good thing from a bug-fixing point of view, but expect some flak from the skydiver community. We should make sure to involve them in the testing from the start to avoid a sudden outcry like last time we introduced a change to jumping behavior that was minuscule compared to this.
While testing whether it also fixes the too low clamp of physics time per frame (it doesn’t, so I will try to see if my previous solution for that still applies), I found something interesting: previously, on my very old PC, it would take me 2:39 to run to flag 3125 in Minkata and 3:20 back (at, rough guess, 5 to 10 FPS). Now, it takes 3:07 there and 4:01 back. That probably means that either the frame rate is lower, the clamp is even lower, or the running speed is lower. Does any of these sound explainable to you?
The horizontal mouse-look fix is perfect. This was an over-eager optimization on my part but the PR has now been updated. Thanks!
As for your observations in Minkata -
The frame rate shouldn't be lower, I did quite a lot of profiling during development and these changes result in a small but measurable win as far as the simulation is concerned.
Avatar run speed is perfectly in line with the animation velocity (which I've verified), until we start clamping the physics step at which point it begins to slow. I assume this is the clamp that is too low that you are referring to?
I haven't touched the kDefaultMaxDelta value Cyan used, but I would be interested to see the effect on both your frame rate and Minkata run times with the cap increased. However, I'd be surprised if this was the source of the discrepancy as the same clamp should have been applied previously.
It's a little puzzling as everything is behaving as I'd expect, so I can't definitively explain these observations without running some comparative tests with the old code.
I assume this is the clamp that is too low that you are referring to?
Exactly. There are two parts to it, to be exact - one is the kDefaultMaxDelta that I increased from 0.1 to 0.25 in my experiments, and the other is that the piece of simulation time that is clamped off was simply discarded, while in my opinion it should be kept on some accumulator (up to another, higher clamp value), so that we can catch up on it during further, quicker frames. This is also the source of the typical Uru "lag" feeling - being unable to move when the frame rate is low because too much is going on around one.
Anyway, don't spend too much effort on it, it's not really important as long as it only affects old machines with such low frame rates (and frequent frame rate hiccups). Maybe I'll do some deeper investigation.
Hmm, that should just be a catch for rare extremely long frames.
But if it's coming into play almost every frame on slower pc's, the correct fix for me is to just increase the threshold to allow for that. At such low frame rates I'd imagine it's unlikely you'll have the opportunity to catch up in reasonable time.
It's probably worth noting that the current value is 2 full steps below the PhysX default, so I think there is room for a signficant increase.
I have gathered some data. I haven’t drawn any deep conclusions from it yet, but I thought I’d throw it out there already in case you’re interested.
While running out to flag 3125 and back to the cage twice, with the old code and the new code, I recorded the delSecs passed to every call of plSimulationMgr::Advance() (on the assumption that it is called exactly once per drawn frame, which I haven’t actually checked).
On the long run, the deltas seem to match wall clock time fairly well – e.g. on one run that I timed to 151 seconds by stopwatch, they add up to 150.219 s. Viewed up close, I have some doubts, however:
These spikes look suspiciously high. I don’t recall any second-long stalls, or at least not that many. I do expect some spikes, though, since this machine is starved for RAM and swaps a lot.
The valleys look suspiciously deep. I can’t imagine this machine being capable of rendering a frame in 0.007 s when it usually takes 0.1 s.
The areas of a spike and the following valley look suspiciously similar, as if a fixed amount of time is somehow being redistributed here (also reflects the observation that the total sum (integral) is accurate).
I’ll have to examine where these deltas come from, maybe there’s something askew there.
It’s apparent from the second plot that the average frame rate is slightly lower with the new code, but the majority of the difference comes from the fact that it simply appears to take more physics time to run the distance (3rd plot), i.e. the avatar velocity in physics time seems to be lower. I’m also surprised that running back seems to take a bit more physics time than running out (already with the old code, but more pronouncedly with the new one), I’d have expected both to take the same amount.
Stop watch represents the actual time that has passed.
Total frame delta is the total amount of time that was passed to PhysX.
Total time skipped is the amount of time that was discarded by clamping.
Avg simulation time is the average amount of time spent in plSimulationMgr::Advance each frame.
Tables 1 and 2, show that my run times are consistent with the original and updated code in both directions. So good news there.
In order to try and replicate your results I stalled the main loop to simulate a longer render time. The results for these runs are in Tables 3 and 4.
The actual simulation time remains consistent, the big blowout is the amount of time that is being discarded. Which corresponds exactly with the difference in run times.
By increasing the max delta, I was able to bring those times back to 130 seconds. What kind of results did you see when increasing that?
I've found the source the additional time loss with this update.
When fAccumulator is clamped to 0.1, initializing numSubSteps as done above results in a value of 5, not 6. So if running at ~10 Hz, you're effectively losing an additional step on top of the default clamp every frame.
To summarize my findings...
Changing that to line to - int numSubSteps = (int)(float(fAccumulator / fStepSize));
Should make run times closley match the original implementation.
For comparison with my data in tables 3 and 4.
Run to flag.
Stop watch - 147
Total frame delta - 131
Time skipped - 15.18938
Avg sim time - 0.000963
Frame count - 1310
Return from flag.
Stop watch - 163
Total frame delta - 132
Time skipped - 32.052408
Avg sim time - 0.000955
Frame count - 1320
Finally, increasing kDefaultMaxDelta to 0.15f should allow machines running as low as 7.5fps to run at the same speed as everyone else. Exactly how low we want to go with that however is debatable.
Ah, bingo! Good find! So it does turn out to be a hidden case of “the clamp is even lower” from my initial suspicions.
I have reprocessed my original data, replacing my crude cumsum(min(delta, 0.1)) to calculate total physics time by a more accurate model of the accumulating/clamping process, yielding a new figure 3, and lo and behold, we’re back at the expected 131 seconds of physics time. Case closed!
I don’t see what (int)(float(fAccumulator / fStepSize)) would achieve, you probably mean (int)roundf(…) or (int)floorf(… + 0.5f). For even better accuracy though, the round-off error should be kept on the accumulator and taken into account in the next frame. Then we could even stay with the (int)(…). (It actually is kept on the accumulator already, but likely to be clamped off again in the next iteration before it has a chance of being taken into account.) My proposed solution will automatically take care of that, so there is probably no need to do anything just yet.
Here is what I have done in the meantime:
Done the same measurement on a faster computer that runs at a steady 15 FPS and never runs into the clamp. Results are a consistent 131 seconds (wall clock time and physics time) to run out and run back, with new code and old code, confirming that there is no regression in this case. (Now that we understand where the regression came from, we already know that.)
Raised kDefaultMaxDelta to 3.0 on the slow computer, effectively disabling the clamping. Results: figure 1, figure 2, figure 3, suggesting that the regression is gone – the differences between before and after are within the variation inside before and after.
I am convinced now that the deltas are inaccurate. Paying closer attention, the behavior I was seeing during these runs exactly matches what I would expect from these spikes and valleys in the deltas combined with a relatively constant visual frame rate that does not have these spikes and valleys: The camera jolts ahead for one frame, then almost comes to a stop, then slowly accelerates until it reaches the normal speed again after about a second. Interestingly, the avatar seems to lag behind the camera by one frame: when the camera leaps ahead, it almost overtakes the avatar, making him almost drop out of the picture, until he jumps back to the center in the next frame.
Raising the clamp as high as 3 seconds is not what I would do as a real solution – 3 seconds of uncontrolled motion at full speed are likely to make you run over a cliff or something. For now I’m sticking to my original guesstimate of clamping the per-frame delta to 0.25 s and the accumulator to 1.5 s, but I’m reserving final judgment until I figure out what’s the matter with the deltas.
I don’t see what (int)(float(fAccumulator / fStepSize)) would achieve, you probably mean (int)roundf(…) or (int)floorf(… + 0.5f).
No, truncation is what we're after. To illustrate the problem...
Further investigation of those peaks and valleys is definitely warranted. I've only skimmed the timer code, so can't provide any useful input at this time.
For now, I'd like to commit the above mentioned change and increase the kDefaultMaxDelta to .15f before we merge to minkata. Any objections?
Hmm, you are right, it outputs 5, 6, 6 when compiled with MSVC10. I wouldn’t have expected a difference from looking at the C++ code, since the division result should be a float already even without the explicit float(), but I guess that’s something outside of the C++ specification that depends on compiler and compiler options. Looking at the assembly code, in the first case, it keeps the division result in an 80-bit floating point register and truncates it from there, whereas in the second and third case, the number is first stored to memory as a 32-bit float and then read again from there. Compiled with GCC on Mac OS X, it outputs 6, 6, 6. With gcc -mno-sse, it’s 5, 5, 5.
So, I don’t think this is something we should be relying on. I’m not sure what the best way would be of obtaining a more reliable result in the case of clamping, though. Round to nearest, as I suggested? Truncate, but with a small bias – (int)(… + 0.000001f)? Make the clamp slightly larger than a whole multiple of the step size, e.g. 0.100001f or 0.150001f?
For now, I'd like to commit the above mentioned change and increase the kDefaultMaxDelta to .15f before we merge to minkata. Any objections?
No objections as far as the goal of this pull request goes, as an intermediate result. We have a regression in a very specific and hopefully rare edge case, in that something that was bad to begin with got slightly worse. No big deal. Whatever we want do do about it as a real solution can happen in a separate pull request and doesn’t need to block this one. I consider this an intermediate band-aid, and whether such one is applied now or not makes little difference to me.
As far as my further goal goes, 0.15 is still too low in my opinion. I still consider the game very playable at 7 FPS. I feel that players who run at such low frame rates (even if only temporarily in a “lag” situation), because they have slow computers and/or value visual quality higher than frame rate, are unnecessarily put at a disadvantage, and my goal is fixing that. From my current point of view, I would propose 0.25, as 4 FPS is about where I would put the limit of playability, as a conservative guess, and uncontrolled flight of 0.25 s seems tolerable.
Pull request has been updated.
I went with truncate, but with a small bias. The problem occurs with any multiple of fStepSize and compensating in the clamp wouldn't handle the other (admittedly rare) cases. Rounding would have required additional changes.
I have reservations about lifting the clamp too far. 0.25 is a huge step size for the character controller to be taking. We would probably need to do that in 2 passes to keep things stable. Before heading down that path, I'm eager to see your solution to better handle lower frame rates and we can take it from there.
To summarize our wall of text. :)
Users running above 10 FPS should see no change to the avatar run speed.
Between 7 and 10 FPS you should now see a significant improvement over the original code (Speed should be comparable with those above).
At frame rates below 7, some improvement will be evident, but the avatar will still run slightly slower than everyone else (which Christian is working on).
I have reservations about lifting the clamp too far. 0.25 is a huge step size for the character controller to be taking.
OK, I haven’t thought about that. Agreed, let’s wait to see what I come up with.
I have tried the latest code on the slow PC, and I now run out in 140 s and back in 149 s, which is a lot better than with Cyan’s code and pretty close to the ideal 131, so I’m a happy camper.
So in summary, everyone is going to need to recalibrate their Minkata runs if they do it by time, right?
I’m not sure at this point if it’s everyone or just everyone who has a significant fraction of frames taking longer than 0.1s. Probably the latter. I guess I could also do the measurement on a faster computer to check.
My ultimate goal (second half of the reason why I’m measuring this, apart from investigating the difference between before and after Skoader’s change) is to fix that so that everyone runs at the same speed, no matter what their frame rate is.
Pull request has been update with a couple of fixes for some issues uncovered during testing.
The subworld transition fix should address both the Gahreesen gear ride and the "sticky floor" in Er'cana. The other should prevent panic linking on the way to sphere 4 in Ahnonay.
Some additional notes on that commit -
An ideal solution would involve determing the reason the avatar has been moved and making a decision based on that rather than the displacement threshold I've used here. While the code appears to have done something along those lines at some point in the past, the reasons have become ambiguous and can't currently be relied on.
Tried it (together with #22) and can confirm that the sticky floor in Er'cana and the overhead camera in Ahnonay are fixed. I didn’t have any problems with quabs so couldn’t test that.
A small regression I’m seeing on my own build: When entering and leaving the Relto hut, the camera sometimes quickly cuts back and forth between the inside and outside view a few times. I’m not sure why this change would cause that, is there a subworld in the Relto hut? I haven’t seen it on Minkata with build 66 so far, but that’s on a different computer, different PhysX, different build, different server, haven’t checked what makes the difference yet.
I would consider build 66 fit for the shard. If anything major breaks we can always add another update.
That is caused by a convex hull/capsule collision bug in PhysX 2.6. It's worse in 2.6.4 but is evident even in the MOULa client. Cyan went to great lengths to work around it with limited success. I did some work to improve on those workarounds in a different fork, but ultimately opted to move to a newer version of PhysX where the bug has been fixed.
The bug causes multiple enter and exit events to fire as a capsule shape enters or exits a convex hull trigger shape. Depending on how fast they come through, it can result in undesired camera pushes/pops as you noticed, and can at times even break sound effect detectors.
I've committed some more changes to address the remaining issues that have been reported.
Controlled flight fix should prevent wiggle flying.
The windmill animation problem noted by janaba, revealed an issue with updating disabled avatars in between physics steps. This should now be fixed.
Clearing the avatars velocity after each physics update (as Cyan originally did) should fix the avatar shudder associated with shell portals.
I've created a moulscript-ou PR to address the vogondola panic link reported by D'Lanor. I haven't fixed the avatar floating above the chair as noted in the supplied video - This appears to be an unrelated animation bug that's beyond the scope of this update (I can reproduce it on my fast gaming machine with the standard MOULa client).
Aside from fixing the panic link, the script change coupled with these physics updates provide a significant improvement to avatar stability while riding the vogondola.
(I’m a bit behind, seeing as rarified has already put this on the shard, and testers are beginning to report the same things, but for completeness:)
I have tested this in my own build (together with moulscript #17 and #16) and can confirm:
Wigglefly is gone (So long, it was fun while it lasted :'( ;) )
I had no issues with the cleft windmill before or after
Avatar shudder in shell portals is gone
Vogondola panic link is fixed
Camera flicker when entering/leaving Relto hut is gone
Fixed avatar appearing to jump after dismounting from a ladder.
Fixed landing behaviour incorrecting firing after leaving the water. (This bug is currently on MOULa).
Reworked the PhysX collision flutter workaround to better handle erroneus trigger events under the new timing method. The cleanup will also make the removal of these workarounds trivial if we ever move to a version of the PhysX SDK where the bug has been fixed.
Ensure height correction is only applied to human avatars. (This should have no visible effect).
Sounds good... if Christian doesn't beat me to it this morning, I'll merge it in this afternoon. I was planning on an update tonight anyway to push some scripting changes to Minkata. Thanks!
Go ahead, Rarified – with dinner and TMP Funday the rest of my Sunday is more or less filled, so I probably wouldn’t be able to finish testing it anyway.
The diff size of d85a727 looks a bit scary – Skoader, are you confident that this isn’t something that we would have to repeat the whole Minkata testing exercise for?
Shouldn't have to no. It's not as scary as it looks and if there is a problem, it will manifest itself rather obviously - Just about all detectors would be affected.
Regarding the triviality of removing the workarounds should we move to a newer PhysX SDK, are there comments to point the way? Or can you include a comment section somewhere that helps other devs understand the necessary differences? Great work, Skoader!
I haven't left any comments. For this particular workaround, it's now just a matter of removing the code in the USE_PHYSX_COLLISION_FLUTTER_WORKAROUND blocks. There is room for further optimization if necessary.
There are a couple of other workarounds that aren't quite as straight forward, I'll tackle these at a later date.
I left a few anal line comments. Nothing major, just a few general good Plasma practices.
The lines you commented on were part of the original code, but what you've said makes sense. I'll keep it in mind for the future. Thanks very much for the input!
That doesn't surprise me in the least bit... I find that the stuff we pick out the most in pull requests (on github, anyway) is moved Cyan code. ;)