Physical Modelling of sim2real SO101 Arm Project

Important: Isaac Sim support

5.1.0

Operating System

Ubuntu 24.04 (arm64)

GPU Information

  • Model: GB10 (Blackwell)
  • Driver Version: 580.142
  • CUDA Version: 13.0

Topic Description

Is it possible to rollout example pretrained VLA policies in the real world?

Detailed Description

Currently following the Train an SO-101 Robot From Sim-to-Real With NVIDIA Isaac Learning Tutorial. I’ve been able to setup the actual docker container just fine and have been able to rollout the various example policies that are provided in the tutorial in simulation with high (70-90%) success rates. The issue lies with the Simulation to Real portion, particularly, the Real Evaluation section. I built the lightbox, mounted the light, external camera, and robot SO101 arm all as mentioned. These same conditions I would assume work between the provided example policies, the original real setup, and a new setup, yet I’ve not once been able to place a vial into the rack, probably over the span of 150-200 episodes, varying the environment plenty of times.

Within the “Real Evaluation” section there’s no mention of success rate, no video footage from the physical arm completing the task, and no dataset visualization in huggingface of an evaluation dataset. I took a look at one of the datasets in the Dataset Visualizer from LeRobot and noticed the position of the arm (in sim) wasn’t at the base of the lightbox, but instead the base of the mat.

I understand VLAs like Gr00t N1.6 require finetuning for a variety of tasks, especially as it needs to get deployed in general scenarios and I don’t expect high success rates (aiming for around 30-40%), but I’m just recreating the original environment that was quite isolated from the rest of the world (hence the lighbox), so policies should be able to run on my setup as well.

Steps to Reproduce

  1. Make sure the real docker container from the tutorial has been built

  2. Start the real docker container, select a model, run the policy server
    2.1) export MODEL=aravindhs-NV/grootn16-finetune_sreetz- so101_teleop_vials_rack_left/checkpoint-1000
    2.2) python Isaac-GR00T/gr00t/eval/run_gr00t_server.py –model-path /workspace/models/$MODEL

  3. Run the evaluation rollout for docker : attach the docker and run
    3.1) docker exec -it real-robot /bin/bash
    3.2) ```python Isaac-GR00T/gr00t/eval/real_robot/SO100/so101_eval.py
    –robot.type=so101_follower
    –robot.port=“$ROBOT_PORT”
    –robot.id=“$ROBOT_ID”
    –robot.cameras=“{
    wrist: {type: opencv, index_or_path: $CAMERA_GRIPPER, width: 640, height: 480, fps: 30},
    front: {type: opencv, index_or_path: $CAMERA_EXTERNAL, width: 640, height: 480, fps: 30}
    }”
    –policy_host=localhost
    –policy_port=5555
    –lang_instruction=“Pick up the vial and place it in the yellow rack”
    –rerun True```

  4. Let it run and watch it not succeed.

Error Messages

No errors, just poor success rate (0%)

Screenshots or Videos

Additional photos in the google drive link , sorry for the lack of “film production” but I hope this is a bit useful into insight of my environment. A bit unfortunate that I’m new and can only 2 photos/videos

Additional Information

What I’ve Tried

  • I double checked with the setting up of the workspace section and all of my initial measurements are the same as the original authors.
  • I also recalibrated both arms, verified they work in simulation, and checked the calibration, in which my standard deviations of joint position are on the same magnitude as the image
  • Tried to recreate the environment based on a Given Dataset
  • Tried Multiple Orientations of the External Camera to limit noise
  • Tried Multiple Poses (position + orientation) of the rack relative to the arm
  • Tried Multiple # of Vials along with vial poses
  • Tried Multiple amounts of light intensity (25% - 100%) with the provided light bar

Related Issues

N/A

Additional Context

The authors did a fantastic job on the tutorial, certainly more information provided than many other tutorials, I’m also hopeful to see the sim2real pipeline with VLAs. Just a bit unfortunate it’s lacking in the real evaluation portion.

Also, I’ve tried again recently have been getting the real arm to consistently, autonomously pick up a vial by positioning the vial central to the arm with random orientations, but below the top of the rack (imagine you drew a transversal line at the top of the rack close to the back wall, going across the width of the lightbox, the cap would be below that).

Edit:

I began changing some things around the physical project setup to match the simulation environment (without domain randomization) based on relative position of objects and got it to function with 80% success in 10 episodes! It was very particular though, which is a bit concerning being applied to a generalist policy like Groot.
Conditions: 50% light, 1 vial in top left of container, 1 vial on mat, top left of mat 4.5cm from back, top right of mat 2.5 cm from back. Rack: 18cm from left wall, 17 cm from back wall, 90 \degree perfect. Vial 18.5 cm right from the rack, 2 cm from top of the rack transversal line, also 90 \degree perfect.

Model: so101_telop_vials_rack_left/checkpoint-100005

Hi, thanks for your posting. But for sim2real discussions related to the tutorial you referred to, I would recommend you to use the LeRobot community channels, especially the Hugging Face community since it’s a HF project.

Thanks for reaching out with such a detailed report of your leanings. This kind of info helps me understand what to improve on the experience and course instructions - will plan to add more videos of rollouts per your note. I’m glad to hear your update helped the policy succeed.

If I can offer a potential “quick fix” suggestion, try aiming your external webcam slightly higher so it can see more of the robot, as well as the rack. As a reference, see some of the real episodes I captured for a camera framing reference. This is based on my experience from GTC and my own lab-experiments.

For some context, this workshop was originally given as an in-person workshop with 40 of these setups, with “identical” robots from WowRobo. Where we saw deviation was in either robot calibration, or if the camera angle was a little too low or too high, so usually one of those tweaks would get us to the ~60-70% success rate range. In my opinion, we’d ideally use a wider field-of-view physical camera so it’s less sensitive to a balance between what is in frame or not (robot vs vial rack), but we also wanted to make the kit very affordable and this webcam fit that description. That explains some of why the camera positioning needs careful adjustment. If you happen to have a Realsense D455, that will exactly match the camera in simulation.

Another consideration is the robot color being different from our datasets and blending with the mat (and this would be a time where more DR or Cosmos-augmentation could help). While we use black robots in the sim DR, I’m not sure if we tested black robots against physical rollouts. What might be interesting is to either change the mat color (for more contrast) or to try co-training a model with our sim dataset, with 5 or 10 real demonstrations added from your own setup.

Lastly, notice that in sim we used a fairly constrained range of possible vial positions. More variation here would train a more robust policy.

Hope this helps!