In a general RL env, it seems that the MDP is conducted at each step, like the observation, action, reward are updated at each env step.
But how to train a one-step MDP agent? I only need the policy to give the action once during one episode.
Is there any method to block the update of observation, action and rewards at each step? Or is there any framework for one-step MDP training?
Thank you!
Thank you for your interest in Isaac Lab. We don’t have a clear example for this, but you can review the omni.isaac.lab.envs.mdp library and the tutorials to get started.
For future reference, to ensure efficient support and collaboration, please submit your topic to its GitHub repo following the instructions provided on Isaac Lab’s Contributing Guidelines regarding discussions, submitting issues, feature requests, and contributing to the project.
We appreciate your understanding and look forward to assisting you.