Hi @toni.sm. Thanks to you, I was able to successfully train my agent using IsaacSim RL Framework.
I have two questions with the trained policies.
- Deployment into IsaacSim Extension.
- I plan to apply the trained RL policy to a single task in a continuous robot automation process.
I want to control a robot based on ROS in IsaacSim Extension/Standalone mode and perform continuous tasks.
Is it possible to control based on a trained policy for only a single task out of successive tasks with a single Python file?
If possible, I would appreciate it if you could point me to an example or site that I can refer to.
- Hierarchical Learning.
- As for difficult tasks, I know that there is a learning method that starts with easy tasks and then increases the level of difficulty step by step.
To this end, the primary trained Deep neural network should be reloaded and trained, and a high-level task can be performed by repeating this process.
My question is, is such a feature possible in the ISaacSim RL Framework ?
Thank you !
Thank you for your kind reply.
To extend the first question, I have 3 tasks in my robot automation process.
In the process, the last task needs the trained policy used by DRL. Then, I am trying to integrate the individual tasks into one Python file.
In my opinion, the code is being executed in the “While” loop of Standalone python file, then the policy learned by a specific flag is loaded and the task should be executed.
I wonder if there is a case where DRL policies are used only for some of the consecutive tasks in this way.
Thank you !
Hi @swimpark
The idea of running the simulation loop and given some awareness load the learned policy with RL is fine.
Just keep in mind (system dependent) that the policy may have been trained to work at a frequency (possibly lower) than the one at which the simulation is run. In this case you would have to run a series of world updates between each policy execution.
No example comes to mind.
Nevertheless, for the implementation, it is not mandatory to create an environment based on gym/gymnasium or other interfaces. Simply grouping the observation, and passing it to the trained policy, taking into account when the episode ends (in case it is an episodic task) would be enough.