Please provide all relevant details below before submitting your post. This will help the community provide more accurate and timely assistance. After submitting, you can check the appropriate boxes. Remember, you can always edit your post later to include additional information if needed.
Isaac Sim Version
5.1.0
5.0.0
4.5.0 V
4.2.0
4.1.0
4.0.0
4.5.0
2023.1.1
2023.1.0-hotfix.1
Other (please specify):
Operating System
Ubuntu 24.04
Ubuntu 22.04 V
Ubuntu 20.04
Windows 11
Windows 10
Other (please specify):
GPU Information
Model:RTX 4090
Driver Version:
Topic Description
Hello! I’m exploring whether it’s possible to use an open‑source Vision‑Language‑Action (VLA) model—such as NVIDIA’s GR00T‑N1.6‑3B—to control a robot in Isaac Sim through natural‑language commands without any additional fine‑tuning. I want to have the robot perform manipulation tasks, like picking up a cube and placing it elsewhere, based solely on a verbal or text instruction, using the model as‑is. Has anyone tried this? Is it realistically feasible to achieve accurate control with an off‑the‑shelf VLA model? If there are official documents, tutorials, or example projects on integrating existing VLA models with Isaac Sim for natural‑language‑based tasks, I’d appreciate any guidance or references.
Yes! You can definitely take Gr00t off the shelf and run inference for robot control without post-training. MMV depending on the type of robot embodiment you use. There are several pre-trained embodiments that will probably be the best to start with. And, you might have better luck starting with manipulator arms before jumping to a humanoid embodiment.