From the Stable Baselines page on Vectorized Environments (Vectorized Environments — Stable Baselines 2.10.2 documentation):
When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th environment when
done[i]is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated. You can access the “real” final observation of the terminated episode—that is, the one that accompanied the
doneevent provided by the underlying environment—using the
terminal_observationkeys in the info dicts returned by the vecenv.
Am I correct in assuming that resets in Isaac Gym do NOT work like this?
In other words, an environment returns a terminal observation and resets itself on the following step, discarding the action resulting from the terminal observation. (This seems to be the case since in each tasks’s
self.reset(...) is called before the
self.reset_buf is updated.)
Additionally, is there any point in setting
self.reset_buf[env_ids] = 0 at the end of
self.reset(...)? It seems like this value just gets overwritten when
compute_reward(...) is called later in
post_physics_step(...). Additionally, at the end of
anymal.py, there is the line
self.reset_buf[env_ids] = 1 instead of
self.reset_buf[env_ids] = 0. Why is this? Does it just not matter?