From the Stable Baselines page on Vectorized Environments (Vectorized Environments — Stable Baselines 2.10.3a0 documentation):
When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th environment when
done[i]
is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated. You can access the “real” final observation of the terminated episode—that is, the one that accompanied thedone
event provided by the underlying environment—using theterminal_observation
keys in the info dicts returned by the vecenv.
Am I correct in assuming that resets in Isaac Gym do NOT work like this?
In other words, an environment returns a terminal observation and resets itself on the following step, discarding the action resulting from the terminal observation. (This seems to be the case since in each tasks’s post_physics_step(...)
, self.reset(...)
is called before the self.reset_buf
is updated.)
Additionally, is there any point in setting self.reset_buf[env_ids] = 0
at the end of self.reset(...)
? It seems like this value just gets overwritten when compute_reward(...)
is called later in post_physics_step(...)
. Additionally, at the end of self.reset(...)
in anymal.py
, there is the line self.reset_buf[env_ids] = 1
instead of self.reset_buf[env_ids] = 0
. Why is this? Does it just not matter?