Optimization issues - physx.num_threads

Hi! I wanna start by saying thank you for all bringing us this simulator!

I made a simple environment using the Franka panda arm and ran some tests on CPU/GPU utilisation. I have found that by increasing sim_params.physx.num_threads you get worse results. All results bellow were gathered using the cProfile library.

Using num_threads=6:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2500    9.291    0.004   14.499    0.006 base_sim.py:270(step)

CPU usage: 25% on each core

Using num_threads=0:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2500    7.963    0.003   12.330    0.005 base_sim.py:270(step)

CPU usage: 100% on one core

Using num_threads=24:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2500   30.227    0.012   34.571    0.014 base_sim.py:270(step)

CPU usage: 85% on each core

Even when running the rl_games example on the FrankaCabinet, I only get 25% CPU utilization. Any idea how I can improve this?

Please let me know if you need more details. Thanks in advance!


I’ve realised maybe the sim_params I’ve used would help:

    sim_params = gymapi.SimParams()
    sim_params.dt = 1. / 60.
    sim_params.substeps = 2
    sim_params.up_axis = gymapi.UP_AXIS_Z
    sim_params.gravity = gymapi.Vec3(0.0, 0.0, -9.81)
    # sim_params.num_client_threads = 0

    sim_params.physx.use_gpu = True
    sim_params.physx.solver_type = 1
    sim_params.physx.num_position_iterations = 8
    sim_params.physx.num_velocity_iterations = 0
    sim_params.physx.contact_offset = 0.005
    sim_params.physx.rest_offset = 0.0
    sim_params.physx.bounce_threshold_velocity = 0.5
    sim_params.physx.max_depenetration_velocity = 1000.0
    sim_params.physx.default_buffer_size_multiplier = 5.0
    sim_params.physx.always_use_articulations = False

    # sim_params.physx.num_subscenes = 0
    sim_params.physx.num_threads = 6
    sim_params.physx.max_gpu_contact_pairs = 8 * 1024 * 1024
    sim_params.use_gpu_pipeline = True

Hi @mihai.anca13,

According to your config, you are running a simulation on GPU, in this case, there is no benefit in using too many threads. With num_threads=0 allows PhysX to chose the CPU core to run its work. 100% load shows that this variant is CPU bound. Our usual recommendations to start with and the default value is num_threads=4 so GPU simulation performance is not bounded by CPU. But I’d say the main metrics should be GPU simulation performance, total FPS (num environment steps per second), CPU utilization, in this case, is only a helpful metric, that allows us to find if the performance is CPU bounded or in the opposite much more CPU cores are used, than required for scheduling of the GPU work and some contact’s related work.

Thank you for your reply!

I’m not sure I understand how to increase the frame rate in that case. If neither the GPU nor the CPU are used at 100%, how can I increase the speed at which steps are taken?

There are a few ways of increasing the frame rate (number environment steps per second) when simulating on a GPU:

  • The simplest one - to increase the number of environments.
  • Increase the timestep or decrease the number of position iterations till the limit when a simulation is still stable and contact/grasping behaviour works as expected.
  • Simplify collision shapes for the robot and manipulated objects.
1 Like

I was experimenting in the meantime with the things you just mentioned and managed to increase the performance to the level I needed. Thank you!