Error running rlg_train.py

james.paul.foster · September 27, 2021, 5:14pm

Hello,

I’ve been proceeding through the installation instructions for Isaac Gym. After successfully running all of the examples as well as some Cartpole training with train.py inside rlgpu, I wanted higher performance and so followed the instructions to install rl_games.

When I use rlg_train.py on a simple Cartpole example I get the following error:

(rlgpu) ~/isaacgym/python/rlgpu$ python rlg_train.py --task Cartpole --headless
Importing module 'gym_37' (/home/jfoster/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/jfoster/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 1.8.1
Device count 1
/home/jfoster/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/jfoster/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /home/jfoster/.cache/torch_extensions/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
Importing module 'rlgpu_37' (/home/jfoster/isaacgym/python/isaacgym/_bindings/linux-x86_64/rlgpu_37.so)
Setting seed: 1330
Started to train
Python
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
/home/jfoster/miniconda3/envs/rlgpu/lib/python3.7/site-packages/gym/logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
RL device:  cuda:0
512
1
4
0
Box([-1.], [1.], (1,), float32) Box([-inf -inf -inf -inf], [inf inf inf inf], (4,), float32)
Env info:
{'action_space': Box([-1.], [1.], (1,), float32), 'observation_space': Box([-inf -inf -inf -inf], [inf inf inf inf], (4,), float32)}
Traceback (most recent call last):
  File "rlg_train.py", line 167, in <module>
    runner.run(vargs)
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/torch_runner.py", line 139, in run
    self.run_train()
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/torch_runner.py", line 122, in run_train
    agent = self.algo_factory.create(self.algo_name, base_name='run', config=self.config)  
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/common/object_factory.py", line 15, in create
    return builder(**kwargs)
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/torch_runner.py", line 23, in <lambda>
    self.algo_factory.register_builder('a2c_continuous', lambda **kwargs : a2c_continuous.A2CAgent(**kwargs))
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/algos_torch/a2c_continuous.py", line 18, in __init__
    a2c_common.ContinuousA2CBase.__init__(self, base_name, config)
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/common/a2c_common.py", line 966, in __init__
    A2CBase.__init__(self, base_name, config)
  File "/home/jfoster/isaacgym/python/rlgpu/rl_games/rl_games/common/a2c_common.py", line 124, in __init__
    self.kl_threshold = config['kl_threshold']
KeyError: 'kl_threshold'

I’ve been trying to track this down – I think my rl_games installation isn’t too good… any ideas?

james.paul.foster · October 20, 2021, 4:18pm

Don’t be like me folks! Read the READMEs fully before crying foul. According to the “Known Issues” section of the rl_games repo:

Starting from rl-games 1.1.0 old yaml configs won’t be compatible with the new version: steps_num should be changed to horizon_length and lr_threshold to kl_threshold

Solving the issue was as simple as going into all the rlg_*.yaml files in python/rlgpu/cfg/train/rlg, and then searching for and changing these fields accordingly.

kellyg · October 25, 2021, 3:47pm

Yes, there has been an update in the rl_games repo. We will be providing updated yaml config files in our next release.

Topic		Replies	Views
Error in Reinforcement Learning example Isaac Gym pytorch	4	3745	October 12, 2021
Isaac Gym - Training Ant (Error: Failed to open libnvrtc-builtins.so.11.1) Isaac Gym cuda , training , gym , ant , omniverse_extension	3	2924	February 18, 2022
Error running train.py Isaac Gym cuda , kernel , ubuntu	1	1090	October 29, 2021
IsaacGymEnv task run issue Isaac Gym	2	608	June 29, 2023
Train error for rlg_hydra.py Isaac Gym	2	974	December 17, 2021
Gym cuda error: running out of memory Isaac Gym	16	5736	January 10, 2024
Errors when running Cartpole RL example in Isaac Sim 2023.1.1 Isaac Sim	4	257	June 27, 2024
Isaac Gym + 3090 issues Isaac Gym	7	5986	September 30, 2022
Running the examples in OmniIsaacGymEnvs Isaac Gym	1	645	June 21, 2022
RuntimeError: Arguments for call are not valid Isaac Gym	3	1341	July 4, 2022

Error running rlg_train.py

Related topics