How to train the model when resetting the environment instead of training the model for every step

I need some time to observe the results of simulation. Then, I can return the reward buffer depending on the simulation results. While observing the simulation results, my agent should not be trained. However, I can’t find a way to pause training. What should I do to solve the problem?

If you don’t have your own RL library designed for this workflow, one thing you could try is to modify the step() function in vec_task.py. The RL algorithm calls step() to retrieve the buffers it needs for training, so you could run multiple simulation steps here in a loop with gym.simulate() to step simulation and post_physics_step() to compute the observations.

2 Likes