Sims are no longer deterministic when multi-threading is enabled and rigidbodies overlap

ivanisavich · June 27, 2018, 1:28am

I’ve noticed an issue recently with some of my physx simulations. If I have a group of rigidbodies placed closely together (overlapping/interpenetrating) and run a simulation on them with threading enabled (anywhere from 2-16 workers), the sim results will differ occasionally on successive re-runs, and overall the sim is not deterministic.

Some other info:

-I’m using the latest PhysX repo
-All rigidbodies have the same shape (sphere collider), mass (1) and starting orientation.
-The sim has a single ground collider plane.
-CCD is not enabled.
-This is a CPU sim, not a GPU sim.
-This problem occurs no matter what the sim substeps are set to.
-The fewer the rigidbodies, the less repeatable this problem is. 50 rigidbodies means the jitter only happens every 10th or so re-run. With hundreds of rigidbodies in the same setup, the sim is never deterministic.
-The problem only seems to occur when the rigidbodies are inter-penetrating at frame 0. The more interpenetrations, the easier it is to repeat the problem. I’m not sure if this is the only cause, but it’s the only way I’ve been able to repeat the error.

Is non-determinism a known limitation of multithreaded sims?

kstorey · June 27, 2018, 3:26pm

We have quite a few tests for determinism, including quite complex cases. It is possible you’ve found an edge case but PhysX should be deterministic, regardless of the number of threads being used.

Could you confirm how you are testing this?

Determinism requires a few things:

(1) The application itself using PhysX has to be deterministic (fixed time-steps, exact same set of actors, actors instantiated in the same order, events occurring on the same frame etc.)
(2) You have to destroy everything and re-create it, e.g. scenes, actors, shapes etc. The reason for this is that there are some internal pools to optimize allocations by re-cycling structures like interactions, but this can change the order in which interactions are processed.

There is also PxSceneFlag::eENABLE_ENHANCED_DETERMINISM, which is required to get deterministic results for a set of bodies when simulated either individually or as part of a larger scene (where the other actors in the larger scene do not interact with the set of bodies that were part of the smaller scene). This flag disables some batching optimizations that can produce different results if constraints are batched together differently.

You mentioned that this issue seems to only occur when bodies are inter-penetrating on the first frame. This may be an edge case - although I can’t think of an obvious reason why this would cause non-determinism in the simulation. Is there any chance you could provide a PVD capture of your case so I can use this to produce a repro?

Could you try enabling enhanced determinism to see if this has any influence on your results?

ivanisavich · June 27, 2018, 3:48pm

Thank you for the quick response!

I used fixed time steps, same actors, same order, etc.
Everything does get created and destroyed properly (at least, to my knowledge based on my review of my code…nothing is re-used).
Enabling enhanced determinism does not solve the problem.

I haven’t used PVD yet nor am I sure if it’s compatible with my use of physx (I’m making a plugin for 3dsmax, not a standalone application). I’ll look further into the capture you suggested.

I’ve also done my own custom data dump where I output all vel/pos/tm info about my rigidbodies at each simulation step to a file and a comparison of the data shows that all values are identical between subsequent simulations, up until a point where physx starts returning divergence velocity values on rigidbodies after a certain time. This does not occur with multithreading off.

Since this isn’t a known limitation, I’ll try digging down into the physx code myself to see if I can figure out why this is happening…are there any places you know of in particular that may be the culprit? I’m not really familiar with how physx does its threading, but am familiar with threading practices in general.

kstorey · June 27, 2018, 4:08pm

Deterministic results basically depend on the initial state being consistent (let’s assume that this is the case) and that the exact same contacts are generated and the constraints (both contacts and joints) are processed by the solver in the same order.

Please note that even a tiny 1-bit difference in a value potentially snowballs into visibly divergent results over time and these differences are often lost if you instrument using printf so you might need to reinterpret float values as integers and log those to identify 1-bit differences in bit patterns.

ivanisavich · June 27, 2018, 4:12pm

Good point! When I get a chance I’ll re-run my logger doing exactly that and post the results. Since everything is floats I’m definitely open to the possibility of some floating point imprecision happening in my code somewhere.

ivanisavich · June 28, 2018, 1:32am

Everything came back a-ok on my end again, and I just realized…it can’t be my code because everything is perfectly deterministic if I turn PhysX multithreading off. It’s only when I set the number of workers to >1 that things get indeterministic.

I’ll keep digging around in the physx code and report back if I find anything.

ivanisavich · July 29, 2018, 10:39pm

Thought I would update this thread:

The issue turned out to be user error on my part.

After a lot of debugging I finally figured out what was going on: I was doing a hashtable modification in contact filter, but forgot to put it inside of a lock guard, so that was occasionally causing a race condition. That also explains why the inconsistent behavior was only occurring when collisions were occurring between rigidbodies.

Topic		Replies	Views
Once and for all - is PhysX 3.3.2 deterministic when using a fixed time step on the same machine? Physics Modeling (closed)	2	4452	April 21, 2016
Max number of rigidbodies? Physics Modeling (closed)	12	2970	May 30, 2017
PxRigidDynamic actors clip through other actors (CCD enabled) Physics Modeling (closed)	5	1625	January 20, 2014
Back to SIMD CUDA Programming and Performance	21	184	November 12, 2024
atomicCAS does NOT seem to work Hardware Bug? or Improper use?? TESLA C1060 CUDA Programming and Performance	70	19767	January 21, 2010
Multiple GPUs Devise a synchro mechanism for host threads CUDA Programming and Performance	7	4206	May 13, 2010
Individual envs are not deterministic? Isaac Gym	2	927	December 8, 2022
Slow when there are many collisions in physics (not affected by CPU/GPU performance?) Samples & Examples	3	678	January 18, 2022
simulate/fetch Tip Physics Modeling (closed)	1	1075	July 9, 2014
Thread Id as loop condition Using thread Id as an upper bound within a for loop causes crash CUDA Programming and Performance	10	15410	September 13, 2010

Sims are no longer deterministic when multi-threading is enabled and rigidbodies overlap

Related topics