Performance reference for training Cartpole?

Training Cartpole example on my RTX 2070 takes around 40s (“Total time: 39.42s”). 200 iterations, in headless mode.
Documentation mentions Cartpole “should train in less than 5 seconds in the headless mode”, but doesn’t give the reference hardware.

I’m wondering how should I navigate actually verifying my installation. Does 40s for Cartpole on RTX 2070 seem right?

                       Learning iteration 199/200                       

                       Computation: 44257 steps/s (collection: 0.125s, learning 0.060s)
               Value function loss: 47.2320
                    Surrogate loss: 0.0005
             Mean action noise std: 0.27
                       Mean reward: 496.97
               Mean episode length: 500.00
                  Mean reward/step: 0.97
       Mean episode length/episode: 31.27
--------------------------------------------------------------------------------
                   Total timesteps: 1638400
                    Iteration time: 0.19s
                        Total time: 39.42s
                               ETA: 0.2s

Hi @turbobasic,

You don’t need to train to the full 200 iterations to get complete training. It hits reward close to 500 (and doesn’t fall over) around 55-60 iterations.

Since Cartpole is so simple, GPU hardware doesn’t make all that much difference - it doesn’t use enough parallel environments. It’s more CPU bound, really.

Still, the docs may be a tad optimistic - I just hit > 450 reward in around 11 seconds on a GA100 + 3.7Ghz CPU i7-8700k CPU:

                       Learning iteration 57/200                        

                       Computation: 48190 steps/s (collection: 0.115s, learning 0.055s)
               Value function loss: 99.1446
                    Surrogate loss: 0.0014
             Mean action noise std: 0.29
                       Mean reward: 461.63
               Mean episode length: 472.89
                  Mean reward/step: 0.94
       Mean episode length/episode: 29.57
--------------------------------------------------------------------------------
                   Total timesteps: 475136
                    Iteration time: 0.17s
                        Total time: 10.98s
                               ETA: 27.1s

Take care,
-Gav

Hi @turbobasic,

For me, training cartpole usually takes a few seconds even with rendering enabled. I performed it with rl_games RL framework, with python rlg_train.py --task Cartpole

I think less than 5 sec is an expected training time on pretty any GPU, as the cartpole task is very far from utilizing all the GPU resources and it uses only 256 environments. I can send Gavriel - solving the task happens much earlier the iteration limit is hit, it is pretty conservative. When the reward is above 400 - it means the task is already solved and the cartpole is balancing, then the policy will become just more optimal.

So I’d recommend trying training with rendering on to verify when it is trained and then switch back to the headless mode if you need a maximum speed.

Thank you for your answers.

Training with rl_games seems to reach -some- skill level in 5s even with visualisation enabled, so I get that my installation works properly.

I didn’t get how to run the saved model though. Terminal output, trimmed:

fps step: 55260.6 fps total: 31899.2
fps step: 58459.6 fps total: 33752.3
fps step: 60069.7 fps total: 33560.3
fps step: 48284.1 fps total: 29745.2
fps step: 55449.3 fps total: 32057.0
fps step: 59594.0 fps total: 32795.4
fps step: 60913.7 fps total: 33034.5
fps step: 58998.8 fps total: 33453.4
fps step: 62343.7 fps total: 34228.2
saving next best rewards:  494.54855
=> saving checkpoint './nn/Base.pth'

Running: python rlg_train.py --task Cartpole --play terminates with

FileNotFoundError: [Errno 2] No such file or directory: ''

(from the docs: --test works with train.py, --play is for rlg_train.py, right?)

How do I specify a model to use in a test? For standard train.py I ended up changing filename of the most advanced model to model_0.pt and I’m pretty sure there’s an easier way to do it.

Either of --play or --test similar to train.py will work. You only need to provide a path to the saved weights:

rlg_train.py --task Cartpole --test --checkpoint nn/Base.pth

I, too, am suspicious about the Cartpole results I’m getting. 200 iterations of training takes about 70 seconds on a computer with the configuration described below. Does that seem to be correct? I’m trying to determine whether I’ve installed Isaac Gym properly.

Command: python train.py --task=Cartpole --physx_gpu --device GPU --headless
CPU: Intel Core i5-2500K CPU @ 3.30GHz (four cores)
Memory: 8 GB
Graphics card: 2080 Ti
CPU utilization: 60%
GPU utilization: 13%
GPU memory used: 31%

               Learning iteration 199/200                       

               Computation: 24956 steps/s (collection: 0.185s, learning 0.143s)
       Value function loss: 8.8141
            Surrogate loss: 0.0030
     Mean action noise std: 0.27
               Mean reward: 495.35
       Mean episode length: 500.00
          Mean reward/step: 0.99
Mean episode length/episode: 31.75

           Total timesteps: 1638400
            Iteration time: 0.33s
                Total time: 69.28s
                       ETA: 0.3s

Hi @jim.rothrock,

Did you train with rendering enabled? In this case, it looks like a reasonable time to finish all the iterations, but the training results should be achieved much earlier - in less than 50 iterations. In headless mode, the training should be much much faster.

Also for the comparison, you can try training with rl_games - usually, it is much faster than with rl-pytorch.

Did you train with rendering enabled?

No, I specified --headless.

the training results should be achieved much earlier - in less
than 50 iterations

I did not specify a number of iterations; I just ran this command:

python train.py --task=Cartpole --physx_gpu --device GPU --headless

It appears that the 200 iterations is specified in pytorch_ppo_cartpole.yaml (max_iterations: 200).

In headless mode, the training should be much much faster.

I have no explanation for the lack of speed.