Play a checkpoint file without using gpu at all? to avoid memory errors

i would like to play checkpoints in one terminal while running training in another. when i first tried this i got CUDA error: out of memory, so i tried “playing” the model without using the gpu:

python task=Ant test=True checkpoint=cp.pth num_envs=4 sim_device=cpu rl_device=cpu pipeline=cpu

sadly this still sometimes causes the CUDA memory error.

File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/", line 29, in <lambda>
    self.player_factory.register_builder('a2c_continuous', lambda **kwargs : players.PpoPlayerContinuous(**kwargs))
  File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/rl_games/algos_torch/", line 28, in __init__
    self.actions_low = torch.from_numpy(self.action_space.low.copy()).float().to(self.device)
  File "/home/stuart/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/cuda/", line 170, in _lazy_init
RuntimeError: CUDA error: out of memory
stuart@hp ~/r/I/isaacgymenvs (main) [1]> python task=Ant test=True checkpoint=cp.pth num_envs=4 sim_device=cpu rl_device=cpu pipeline=cpu

is there any way to play the model without touching the GPU at all?

incidentally is there any way to reset the CUDA memory without a full reboot? i tried sudo rmmod nvidia_uvm ; sudo modprobe nvidia_uvm from Reset GPU without restarting linux? - #6 by Harsha - Part 2 & Alumni (2018) - Deep Learning Course Forums, but got some errors that nvidia_uvm was in use and couldn’t figure a way around it

thank you for reading!

nevermind. i was running out of memory because i was terminating training using ctrl-z instead of ctrl-c which was leaving python running in the background. i misread some docs somewhere thinking it suggested ctrl-z. solved by running nvidia-smi and seeing the lingering python processes there.

i’m still a little curious as to why cuda gets initialized even when everything is set to use CPU but it doesn’t really matter for me right now

