Completely different results when evaluating on CPU vs GPU

Hi @vmakoviychuk

I’ve trained an RL policy for a quadruped robot to jump in place. When evaluating with --sim_device=“CPU” the jumps are more regularised and seem more realistic. When I run it on the GPU the quadruped jumps almost a metre higher and it seems very unrealistic. The contact forces are almost doubled on the GPU. Interestingly, if I evaluate on trimesh instead of plane, the performance is more similar between the CPU and GPU (albeit there’s still a ~20cm difference in height).

These are my sim params:
dt = 0.005
substeps = 1

    class physx:
        num_threads = 10
        solver_type = 1  # 0: pgs, 1: tgs
        num_position_iterations = 4
        num_velocity_iterations = 0
        contact_offset = 0.01  # [m]
        rest_offset = 0.0   # [m]
        bounce_threshold_velocity = 0.5 #0.5 [m/s]
        max_depenetration_velocity = 1.0
        max_gpu_contact_pairs = 2**23 #2**24 -> needed for 8000 envs and more
        default_buffer_size_multiplier = 5
        contact_collection = 2 # 0: never, 1: last sub-step, 2: all sub-steps (default=2)

Is there any sim parameters I can tune to bring the behaviour closer? Such drastic differences would be really bad for any sim2real deployment. I’ve seen a few threads that bring this problem up but there’s been no update from Nvidia in the last few months.

I’ve noticed a few things:

Running on CPU (on trimesh or plane) with tgs solver_type results in the same behaviour as running on the GPU, on trimesh with pgs solver_type.

CPU tgs plane: height of 0.59m
CPU tgs trimesh: height of 0.59m

CPU pgs plane: height of 0.71m
CPU pgs trimesh: height of 0.71m

GPU tgs plane: height of 1m
GPU tgs trimesh: height of 0.8m

GPU pgs plane: height of 0.92m
GPU pgs trimesh: height of 0.59m

It seems that the GPU pipeline is much more sensitive to the type of terrain and the solver type.

Hi @vassil17,

Thank you for the investigation! I’ll share the info with the PhysX team and get back to you when we have more info. A quick question to test - does the difference between CPU and GPU stay the same if you increase the number of position iterations to 6 or 8?

Hi @vmakoviychuk ,

Thanks for your reply! Changing the position iterations does slightly affect the GPU by reducing the height by 1-2cm. On the other hand it affects the CPU performance a lot more - at 8 position iterations both CPU and GPU seem to converge to about the same jumping height.

Interestingly, decreasing the simulation step size to 0.002 also seems to have a similar effect, where the jump is higher but both CPU and GPU produce a very similar outcome.

When deploying these policies on the hardware the realised height is much closer to that of the original CPU tgs (slightly more than the 0.50m). I assumed that then the CPU tgs produces the most accurate and realistic simulation, however, the sim2real discrepancy could of course be due to other unmodelled dynamics.

Hi @vmakoviychuk ,
I would like to share that we see the same descripencies between GPU and CPU modelling.
In our case, we are focused on the friction coefficients between rigid shapes and the ground plane. in CPU we are able to produce a more slippery behavior with low friction while in GPU there is some minimal limit of the friction.
sim params is similar to those of @vassil17.
Thanks.

I have also noticed different results on CPU vs GPU, with the GPU mode producing less realistic results. I documented some findings here:

In my internal research code, the discrepancy is much larger; the linked thread just has a smaller case to try and isolate CPU-GPU differences.

Last I checked, NVIDIA was still trying to investigate this difference.