Training Cartpole example on my RTX 2070 takes around 40s (“Total time: 39.42s”). 200 iterations, in headless mode.
Documentation mentions Cartpole “should train in less than 5 seconds in the headless mode”, but doesn’t give the reference hardware.
I’m wondering how should I navigate actually verifying my installation. Does 40s for Cartpole on RTX 2070 seem right?
Learning iteration 199/200
Computation: 44257 steps/s (collection: 0.125s, learning 0.060s)
Value function loss: 47.2320
Surrogate loss: 0.0005
Mean action noise std: 0.27
Mean reward: 496.97
Mean episode length: 500.00
Mean reward/step: 0.97
Mean episode length/episode: 31.27
--------------------------------------------------------------------------------
Total timesteps: 1638400
Iteration time: 0.19s
Total time: 39.42s
ETA: 0.2s
You don’t need to train to the full 200 iterations to get complete training. It hits reward close to 500 (and doesn’t fall over) around 55-60 iterations.
Since Cartpole is so simple, GPU hardware doesn’t make all that much difference - it doesn’t use enough parallel environments. It’s more CPU bound, really.
Still, the docs may be a tad optimistic - I just hit > 450 reward in around 11 seconds on a GA100 + 3.7Ghz CPU i7-8700k CPU:
Learning iteration 57/200
Computation: 48190 steps/s (collection: 0.115s, learning 0.055s)
Value function loss: 99.1446
Surrogate loss: 0.0014
Mean action noise std: 0.29
Mean reward: 461.63
Mean episode length: 472.89
Mean reward/step: 0.94
Mean episode length/episode: 29.57
--------------------------------------------------------------------------------
Total timesteps: 475136
Iteration time: 0.17s
Total time: 10.98s
ETA: 27.1s
For me, training cartpole usually takes a few seconds even with rendering enabled. I performed it with rl_games RL framework, with python rlg_train.py --task Cartpole
I think less than 5 sec is an expected training time on pretty any GPU, as the cartpole task is very far from utilizing all the GPU resources and it uses only 256 environments. I can send Gavriel - solving the task happens much earlier the iteration limit is hit, it is pretty conservative. When the reward is above 400 - it means the task is already solved and the cartpole is balancing, then the policy will become just more optimal.
So I’d recommend trying training with rendering on to verify when it is trained and then switch back to the headless mode if you need a maximum speed.
Running: python rlg_train.py --task Cartpole --play terminates with
FileNotFoundError: [Errno 2] No such file or directory: ''
(from the docs: --test works with train.py, --play is for rlg_train.py, right?)
How do I specify a model to use in a test? For standard train.py I ended up changing filename of the most advanced model to model_0.pt and I’m pretty sure there’s an easier way to do it.
I, too, am suspicious about the Cartpole results I’m getting. 200 iterations of training takes about 70 seconds on a computer with the configuration described below. Does that seem to be correct? I’m trying to determine whether I’ve installed Isaac Gym properly.
Did you train with rendering enabled? In this case, it looks like a reasonable time to finish all the iterations, but the training results should be achieved much earlier - in less than 50 iterations. In headless mode, the training should be much much faster.
Also for the comparison, you can try training with rl_games - usually, it is much faster than with rl-pytorch.