How to train the model when resetting the environment instead of training the model for every step

I need some time to observe the results of simulation. Then, I can return the reward buffer depending on the simulation results. While observing the simulation results, my agent should not be trained. However, I can’t find a way to pause training. What should I do to solve the problem?

If you don’t have your own RL library designed for this workflow, one thing you could try is to modify the step() function in The RL algorithm calls step() to retrieve the buffers it needs for training, so you could run multiple simulation steps here in a loop with gym.simulate() to step simulation and post_physics_step() to compute the observations.