I created a simulation with 512 nv_ant.xml robots( one robot per environment). At each time step, I applied the same action to all of them, but They perform differently.
Can I let them have the same state after performing the same action?
Are you sure that they are not colliding with each other?
Can you trace where the randomness starts?
In my experience, there is indeed some nondeterminism in IsaacGym, but not in the states.
Note that if you’re processing observations or actions with PyTorch, you should preface your scripts with the following for intentional reproducibility:
import os
import torch
seed = 0
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'
torch.backends.cudnn.benchmark = False
torch.use_deterministic_algorithms(True)
torch.manual_seed(seed)
For completeness, you can also set the seeds or RNG states for numpy and standard Python:
import random
import numpy as np
seed = 0
random.seed(seed)
np.random.seed(seed)
See the PyTorch docs for more information on this topic.
Going back, the nondeterminism I was referring to stems from net contact forces on GPU.
Below are the results I get for the first second of simulation for two repeated runs of my script:
Step | Actor state sum (1st) | Actor state sum (2nd) | Net contact sum (1st) | Net contact sum (2nd) | Is different |
---|---|---|---|---|---|
0 | 2102.695556640625 | 2102.695556640625 | 15055.205078125 | 15055.205078125 | |
1 | 2100.20166015625 | 2100.20166015625 | 14990.115234375 | 14990.1162109375 | x |
2 | 2087.267822265625 | 2087.267822265625 | 15164.1875 | 15164.1875 | |
3 | 2087.260009765625 | 2087.260009765625 | 15098.4248046875 | 15098.423828125 | x |
4 | 2105.942626953125 | 2105.942626953125 | 15400.44921875 | 15400.44921875 | |
5 | 2106.02978515625 | 2106.02978515625 | 15456.9765625 | 15456.9765625 | |
6 | 2100.53662109375 | 2100.53662109375 | 15464.087890625 | 15464.0888671875 | x |
7 | 2096.452880859375 | 2096.452880859375 | 14730.203125 | 14730.203125 | |
8 | 2098.6845703125 | 2098.6845703125 | 15038.4541015625 | 15036.212890625 | x |
9 | 2102.801513671875 | 2102.801513671875 | 15766.615234375 | 15766.615234375 | |
10 | 2097.928466796875 | 2097.928466796875 | 15373.068359375 | 15382.4794921875 | x |
11 | 2098.5498046875 | 2098.5498046875 | 15409.013671875 | 15409.0146484375 | x |
12 | 2104.79443359375 | 2104.79443359375 | 15588.66015625 | 15584.486328125 | x |
13 | 2101.228515625 | 2101.228515625 | 15661.18359375 | 15661.18359375 | |
14 | 2102.44140625 | 2102.44140625 | 14947.404296875 | 14947.404296875 | |
15 | 2096.845703125 | 2096.845703125 | 15455.953125 | 15455.9521484375 | x |
16 | 2101.01123046875 | 2101.01123046875 | 15408.615234375 | 15408.615234375 | |
17 | 2110.00732421875 | 2110.00732421875 | 15084.0888671875 | 15084.087890625 | x |
18 | 2106.254150390625 | 2106.254150390625 | 15433.94921875 | 15433.94921875 | |
19 | 2099.318115234375 | 2099.318115234375 | 15446.5634765625 | 15446.5634765625 | |
20 | 2109.6884765625 | 2109.6884765625 | 15391.0859375 | 15391.0859375 | |
21 | 2098.9775390625 | 2098.9775390625 | 15654.236328125 | 15654.236328125 | |
22 | 2100.4775390625 | 2100.4775390625 | 15398.6064453125 | 15398.6064453125 | |
23 | 2109.431640625 | 2109.431640625 | 15019.9619140625 | 15019.962890625 | x |
24 | 2103.654296875 | 2103.654296875 | 15123.583984375 | 15122.5498046875 | x |
See that actor states are consistent, while net contact forces sometimes are, but often aren’t.
I need them for collision detection and this discrepancy is particularly troubling, since it affects my agents’ rewards and therefore model optimisation, as well.
It would be great to see this behaviour confirmed and if there were a feasible solution.
- The randomness starts at the first step. I generate an action for a robot and repeat them by num_envs. After They are stepped one frame. There will be a tiny difference between robots, the error can be accumulated through the steps and produce a totally different situation.
- I have fixed all the randomness seed and I do think the randomness of torch or numpy is the problem as all the actors receive the same actions.
- Is this the problem of Preview 3.As I as moved from 2 to 3.
one reason which comes to my mind is that maybe actions are not very precisely synchronized, even a Nano second difference between updating different environments might lead to some randomness like behavior.
I agree. That could be a chance that the internal synchronization problem leads to this situation.
As I am working on MPC staff. I would hope that the robot can have the same behavior when perform the optimal action at the timestep