Issue with training with GEAR JOINT (replicator ?)



I am training a robot which has a lot of gear joints. I have 2 models, one with the gear joints and one without them (normal joints instead).

The training goes well with the model with normal joints, but it crashes at startup with gear joints with the following error:

2023-05-25 14:30:44 [14,043ms] [Warning] [omni.physx.plugin] PhysX warning: PxSerializationRegistry::getSerializer: failed to find PxSerializer instance for type 263, FILE /buildAgent/work/f25a4639a4b1bdc1/source/physxextensions/src/serialization/SnSerializationRegistry.cpp, LINE 205
/home/me/.local/share/ov/pkg/isaac_sim-2022.2.0/ line 41: 454867 Segmentation fault      (core dumped) $python_exe "$@" $args
There was an error running python

What I have found is that it happens when there is more than one robot instanciated. When I chose the number of env at 1, it works.

Do you have any idea of why this happen ? Something deep in the replicator code ? Any idea how to fix it ?


this might indicate an issue with the physics replication, by not supporting the gear joint. Will try to look at that, I think in IsaacSim you can disable the physics replication, can you please try with the physics replication disabled?

So, it’ technically working but

  • the training takes a LOT more time to start

  • I had to increase gpu_found_lost_aggregate_pairs_capacity to 46426123 for some reason.

  • the trainings generate a lot of NANs after a few epochs…

What do you think ?

a) Yes, this is expected instead of cloning the env, physics is parsing the whole stage, this takes significant more time. I am working on the cloning fix for the gear joints. Should be fixed in next IsaacSim release.
b) That might indicate that there is some other problem, each articulation topology ends up in an aggregate, this seems to indicate that a lot of collisions happen with that aggregate. Maybe some filtering is not setup correct?
c) Not sure out of the box, seems to indicate some simulation issues. If you can feel free to send me a repro through a DM. Will take a look what we can do to improve the simulation quality.

What you can already try are two things:

  1. Try how the simulation looks on CPU
  2. Try to change the solver from TGS to PGS
    This can identify that there is a problem with certain code path.