Performing the exact same simulation in two environments yields different results. If environment spacing is set to zero the differences disappear(lower bounds and upper bounds from create_env set to 0). With large environment spacing robots clearly drift even on a flat plane with no actuation. The drifting increases with changing position on the environment in the environment grid.
PhysX in Isaac Gym uses FP32 data for internal calculations. This means that there is no way to escape the fact that each individual environment spaced away from each other won’t end up having some small differences from one another due to floating point precision errors.
For RL, small deviations like this should typically not be a problem when training a policy in the loop, since you will generally be doing something else different in each environment in one way or another as your agents learn.
That said, I would not expect drifting behavior with no actuation assuming that there is some friction set up. We don’t see anything similar in our examples.
I understand your explanation of why the gym works like this. Sadly that makes gym not fit for our use case as we need fully reproducible evaluations for our research. As you said not spacing out individuals causes them to collide often under the hood, which in the total picture means that it is faster to use separate simple CPU sims (such as Mujoco) for each individual, as they have less startup overhead compared to gym with GPU.
Regarding the drifting I will try to make a minimal example.