"Rewards" in repetitive tests are clearly different

Hi

The “rewards” in repeated tests are distinctly different.
I am using a 6-axis robot, but the orbit is different each time despite the same conditions.
Robot assemble parts on the production line, so position accuracy is required.
Do you guess this is also due to a Pytorch bug?

In this state, even if I get a good pattern in training, it seems that I do not get good results in the test.
I was using torch 1.8.2. + cu111.
torch_deterministic: False

In the video below, the robot and the target are hidden. The moving lines are on the robot side, and the stationary lines are on the target side. Even though the environment is the same, the behavior of the robot changes with rewards.The training environment and the test environment are the same.

After setting “torch_deterministic” to “True”, there was almost no problem.

RunningMeanStd: (18,)
=> loading checkpoint ‘/home/s/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/hor_1024_bs_65536_det_true_0/nn/hor_1024_bs_65536_det_true_0.pth’
reward: 132989056.0 steps: 510.0
reward: 137869056.0 steps: 511.0
reward: 137862208.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0
reward: 137862304.0 steps: 511.0
reward: 137861968.0 steps: 511.0

However, I’m worried that the values will change alternately, but for now, there is no problem.

The problem has recurred.
Apparently it’s a pipeline issue.
torch_deterministic is True.
Would you please check it out?

  • pipeline: “gpu”

RunningMeanStd: (18,)
=> loading checkpoint ‘/home/sa/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/hor_1024_bs_65536_det_true_0/nn/hor_1024_bs_65536_det_true_0.pth’
reward: 168993888.0 steps: 510.0
reward: 344287648.0 steps: 511.0
reward: 172024928.0 steps: 511.0
reward: 344307872.0 steps: 511.0
reward: 886274496.0 steps: 511.0
reward: 173607184.0 steps: 511.0
reward: 925329728.0 steps: 511.0
reward: 173758464.0 steps: 511.0
reward: 924707264.0 steps: 511.0
reward: 173756368.0 steps: 511.0
reward: 344272928.0 steps: 511.0
reward: 886049280.0 steps: 511.0
reward: 173611088.0 steps: 511.0
reward: 925518208.0 steps: 511.0
reward: 173758720.0 steps: 511.0
reward: 344275744.0 steps: 511.0
reward: 347699360.0 steps: 511.0

  • pipeline: “cpu”

RunningMeanStd: (18,)
=> loading checkpoint ‘/home/sa/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/hor_1024_bs_65536_det_true_0/nn/hor_1024_bs_65536_det_true_0.pth’
reward: 168993632.0 steps: 510.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0
reward: 173245376.0 steps: 511.0

Driver Version: 495.44 CUDA Version: 11.5
gpu: rtx A6000
torch 1.8.2. + cu111

Hi @DDPG7 ,

Just to make sure, have you set the seed argument for your runs? We also have some documentation on determinism and reproducibility here: IsaacGymEnvs/reproducibility.md at main · NVIDIA-Omniverse/IsaacGymEnvs · GitHub. Hopefully that can help provide some more insight.

No.
It’s set to “config.yaml” below, so I thought it was okay.

# Task name - used to pick the class to load
task_name: ${task.name}
# experiment name. defaults to name of training config
experiment: ""

# if set to positive integer, overrides the default number of environments
num_envs: ""

# seed - set to -1 to choose random seed
seed: 42
# set to True for deterministic performance
torch_deterministic: True

# set the maximum number of learning iterations to train for. overrides default per-environment setting
max_iterations: ""

## Device config
#  'physx' or 'flex'
physics_engine: "physx"
# whether to use cpu or gpu pipeline
pipeline: "gpu"
# device for running physics simulation
sim_device: "cuda:0"
# device to run RL
rl_device: "cuda:0"
graphics_device_id: 0

## PhysX arguments
num_threads: 10 # Number of worker threads per scene used by PhysX - for CPU PhysX only.
solver_type: 1 # 0: pgs, 1: tgs
num_subscenes: 4 # Splits the simulation into N physics scenes and runs each one in a separate thread

# RLGames Arguments
# test - if set, run policy in inference mode (requires setting checkpoint to load)
test: False
# used to set checkpoint path
checkpoint: ""
# set to True to use multi-gpu horovod training
multi_gpu: False

# disables rendering
headless: False

# set default task and default training config based on task
defaults:
  - task: TASK_NANE
  - train: ${task}PPO
  - hydra/job_logging: disabled

# set the directory where the output files get saved
hydra:
  output_subdir: null
  run:
    dir: .

Then, I set the seed argument ‘42’, reproducibility has been obtained.

RunningMeanStd: (18,)
=> loading checkpoint ‘/home/sat001/wsp/isaac_3/IsaacGymEnvs-main/isaacgymenvs/runs/hor_1024_bs_65536_gpu_0/nn/hor_1024_bs_65536_gpu_0.pth’
reward: 252166624.0 steps: 510.0
reward: 313811808.0 steps: 511.0
reward: 313775776.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0
reward: 313775456.0 steps: 511.0

The “seed” in “config.yaml” is set to ‘42’, does that mean that it needs to be explicitly set in the seed argument at runtime?

The seed in the config should be parsed. If not, then it’s likely a bug. Do you have another seed value overwriting the main config somewhere in the training config?

“seed” was irrelevant. It’s just because “rewords” exceed the maximum value (16,777,215) that float32 can accurately represent. Therefore, it seems that there is reproducibility or not depending on the patterns.

I changed the reward function so that “rewords” and other calculation results do not exceed the upper limit.
If there is a problem, I will reply.

1 Like

The above was wrong.

For example, the reword of the task I created is very different, despite the same pattern each time I run the test. Also, despite the same pattern, about 3 or 4 patterns of operation can be seen.
I can’t imagine why this happens.
Can you predict what is causing it?

Maybe this is normal in IsaacGym?
If so, there is no problem at all.

The lack of reproducibility is due to the gpu pipeline.
Reproducibility came out when switching to cpu pipeline.
The gpu pipeline behaves abnormally rather than being reproducible.
Make sure that the gpu pipeline has reproducibility and behavior issues.

That is quite possible. As we outlined in the docs, there are many factors around the GPU that can make things non-deterministic. We’ve tested for determinism in our examples, but it’s likely there’s a specific API or setup that we missed.

got it!

We expect great improvements in Preview4.
Please do your best!

Hi @DDPG7,

If you are able to narrow down the non-determinacy you’re seeing in the GPU pipeline by excluding your runtime environment changes that would be helpful for us in investigating this further. Unfortunately, non-determinacy issues can be very hard to track down without explicit use cases. If you’re using domain randomizations, try applying the setup_only: True flag to them in your task configuration for comparison.

Take care,
-Gav