Inverse Kinematics has different results on the CPU and GPU [full MWE provided]

I am testing Inverse Kinematics code and I notice that there is a discrepancy between CPU and GPU mode. Here is a full minimum working example on a straightforward IK problem.

Installation and Setup

I’m using Ubuntu 18.04 with an NVIDIA 3090 GPU. I create a conda environment following the Isaac Gym installation instructions. I’m using version 1.0rc4 for isaacgym.

To get all of the data files and the python script use this zip file: - Google Drive This will contain everything you need, including the script ( Unzip it and you should see:

(rlgpu) seita@machine:~/MWE_UR5_IK$ ls -lh 
total 20K
drwxrwxr-x 3 seita seita 4.0K Nov 28 14:06 assets
-rw-rw-r-- 1 seita seita  14K Nov 28 14:06
(rlgpu) seita@machine:~/MWE_UR5_IK$ ls -lh assets/
total 16K
drwxrwxr-x 2 seita seita 4.0K Nov 28 14:06 meshes
-rw-rw-r-- 1 seita seita  12K Nov 28 14:06 ur5.urdf
(rlgpu) seita@machine:~/MWE_UR5_IK$ ls -lh assets/meshes/
total 8.8M
-rw-rw-r-- 1 seita seita 153K Nov 28 14:06 base.dae
-rw-rw-r-- 1 seita seita  29K Nov 28 14:06 base.stl
-rw-rw-r-- 1 seita seita 2.1M Nov 28 14:06 ee3_coarse_edited_.stl
-rw-rw-r-- 1 seita seita 1.5M Nov 28 14:06 forearm.dae
-rw-rw-r-- 1 seita seita  52K Nov 28 14:06 forearm.stl
-rw-rw-r-- 1 seita seita 988K Nov 28 14:06 shoulder.dae
-rw-rw-r-- 1 seita seita  33K Nov 28 14:06 shoulder.stl
-rw-rw-r-- 1 seita seita 2.0M Nov 28 14:06 upperarm.dae
-rw-rw-r-- 1 seita seita  58K Nov 28 14:06 upperarm.stl
-rw-rw-r-- 1 seita seita 931K Nov 28 14:06 wrist1.dae
-rw-rw-r-- 1 seita seita  35K Nov 28 14:06 wrist1.stl
-rw-rw-r-- 1 seita seita 929K Nov 28 14:06 wrist2.dae
-rw-rw-r-- 1 seita seita  35K Nov 28 14:06 wrist2.stl
-rw-rw-r-- 1 seita seita 125K Nov 28 14:06 wrist3.dae
-rw-rw-r-- 1 seita seita  22K Nov 28 14:06 wrist3.stl
(rlgpu) seita@machine:~/MWE_UR5_IK$ 

The python script is about 340 lines, though a lot is for boiler plate code and comments. The summary is that we have a UR5 robot on a linear slider, and I create two spheres (with no collisions for both) one to represent the tip of the UR5 end-effector, and the other to represent the target. The objective is to use IK so that the robot’s EE tip (yellow sphere) reaches the target (blue sphere). We also enforce an orientation so that IK should make the EE tip point upwards.

The Inverse Kinematics code is straight from the example provided in Isaac Gym when I download it: ~/isaacgym/python/examples/ which uses inverse kinematics control with Jacobian matrices and damped least squares. I copied that IK code into the MWE script I linked to (see the .zip file from earlier).

Test CPU vs GPU mode

Then test CPU and GPU mode:

python --pipeline=cpu
python --pipeline=gpu

Observe that the results are different, with CPU mode resulting in better performance compared to GPU mode.

On the CPU the yellow/blue target spheres almost coincide (good) and this is what usually happens:

On the GPU, however, the spheres do not coincide.

I am not sure why this happens. IK will produce a recommended change in the DOF position targets while the robot is in the steady state on the GPU mode, but for some reason the robot isn’t moving or applying those DOFs. (Of course the bigger issue is why CPU and GPU performance differ at all.)

A few follow-up points:

  • CPU mode will print out:
    Physics Engine: PhysX
    Physics Device: cpu
    GPU Pipeline: disabled
  • GPU mode will print out:
    +++ Using GPU PhysX
    Physics Engine: PhysX
    Physics Device: cuda:0
    GPU Pipeline: enabled
  • From the above, it seems like CPU and GPU modes are using CPU and GPU versions of PhysX, respectively, so that could be one difference?

A possibly relevant thread:

Thanks to anyone who might be able to offer advice.


I am just wondering if anyone had the chance to take a look at this?

Hello, I am wondering if anyone had the chance to perhaps try out the minimum working example and to check what might cause the CPU vs GPU difference? Thanks.

Update: never mind, I figured out the issue in this post, it’s explained here:

In this piece of text:

Below is a copy of the original post I made. Note that it still doesn’t resolve my first post about the simulation difference in CPU or GPU but it does resolve a confusion I had about use_gpu.

Following up on this, I have been doing more testing I noticed there is actually a third variant of “CPU vs GPU mode”. If we comment out this line in the MWE:

#args.use_gpu = args.use_gpu_pipeline

Then the strange part is, running the MWE will print:

(rlgpu) seita@takeshi:~/Downloads/MWE_UR5_IK$ python --pipeline=cpu
Importing module 'gym_37' (/home/seita/isaacgym/python/isaacgym/_bindings/linux-x86_64/
Setting GYM_USD_PLUG_INFO_PATH to /home/seita/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 1.12.1
Device count 2
Using /home/seita/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Emitting ninja build file /home/seita/.cache/torch_extensions/py37_cu113/gymtorch/
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
args: Namespace(compute_device_id=0, flex=False, graphics_device_id=0, num_envs=16, num_threads=0, physics_engine=SimType.SIM_PHYSX, physx=False, pipeline='cpu', seed=10, sim_device='cuda:0', sim_device_type='cuda', slices=0, subscenes=0, use_gpu=True, use_gpu_pipeline=False)
device: cpu
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: disabled

The relevant part is the last few lines. It shows the GPU pipeline is “disabled” … but also that the physics device is cuda:0 and that it prints +++ Using GPU PhysX.

and the debugging test shows that this mode will result in the same strange behavior as python --pipeline=gpu. In other words, it produces the IK behavior which doesn’t get the EE tip to coincide with the target.

So, I’m also somewhat confused about what running in “CPU mode” vs “GPU mode” means, since there are 3 variants I’m observing.

  • The best-performing variant is “CPU mode via --pipeline=cpu but with physics device as cpu”.
  • The two other variants perform the same but don’t seem to be that well, they are “CPU mode via --pipeline=cpu but with physics device as cuda” and “GPU mode via --pipeline-gpu with physics device as cuda”.

I see where args.use_gpu is set in the Isaac Gym code (it’s in isaacgym/ though.

I am just wondering if anyone had the chance to take a look at this?

Hi @seita, I started investigating the CPU/GPU IK behavior difference.

Hi @vmakoviychuk Thank you very much for investigating. I will be curious about what you find.

Hi @seita,

PhysX team is investigating the difference in cpu/gpu behavior. I’ll get back to you when have any updates.

Hi @vmakoviychuk as a quick debugging step I did this to change the solver type. In the below method I made one change, I set sim_params.physx.solver_type=0. (The default was 1.)

def get_sim_params(args):
    """Start up a common set of simulation parameters.

    Based on `` provided by Isaac Gym authors.
    sim_params = gymapi.SimParams()
    sim_params.up_axis = gymapi.UP_AXIS_Z
    sim_params.gravity = gymapi.Vec3(0.0, 0.0, -9.8)
    sim_params.dt = 1.0 / 60.0
    sim_params.substeps = 2 
    sim_params.use_gpu_pipeline = args.use_gpu_pipeline
    if args.physics_engine == gymapi.SIM_PHYSX:
        sim_params.physx.solver_type = 0 
        sim_params.physx.num_position_iterations = 8 
        sim_params.physx.num_velocity_iterations = 1 
        sim_params.physx.rest_offset = 0.0 
        sim_params.physx.contact_offset = 0.001
        sim_params.physx.friction_offset_threshold = 0.001
        sim_params.physx.friction_correlation_distance = 0.0005
        sim_params.physx.num_threads = args.num_threads
        sim_params.physx.use_gpu = args.use_gpu
        raise Exception("This example can only be used with PhysX")
    return sim_params

In the example, from GPU mode, now the spheres are coinciding with the targets which was not the case earlier:

Suddenly that seems to have reduced the discrepancy, at least with this particular scenario. I will do some additional testing to see if this fixes the case. My question to you and NVIDIA is if you think this change is something that meaningfully reduces the gap between CPU and GPU performance differences?