I want to verify openvla in robosuite,the following is my system info
robosuite 1.15.1
Ubuntu22.04
Jetson Agx orin(64G)
conda environment(python 3.10)
I wirte a simple code
import robosuite as suite
from robosuite.controllers import load_part_controller_config
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
import torch
local_model_path = "/home/yljy/jetson-containers/data/models/huggingface/models--openvla--openvla-7b/snapshots/31f090d05236101ebfc381b61c674dd4746d4ce0"
processor = AutoProcessor.from_pretrained(local_model_path, trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(
local_model_path,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True
).to("cuda:0")
controller_config = load_part_controller_config(default_controller="IK_POSE")
robosuite_env = suite.make(
"Lift",
robots="Panda",
has_renderer=True,
has_offscreen_renderer=True,
use_camera_obs=True,
camera_names="frontview",
camera_heights=640,
camera_widths=480
)
obs = robosuite_env.reset()
prompt = "In: What action should the robot take to pick up the cube?\nOut:"
while(True):
image = Image.fromarray(obs['frontview_image'])
inputs = processor(prompt, image).to("cuda:0", dtype=torch.bfloat16)
action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)
action[0:3] = action[0:3]*100 #because the sensitivity is 0.01
action[6] = action[6]*2-1 #related to the gripper openvla is [0,1],robosuite is [-1,1]
print(action)
obs, reward, done, info = robosuite_env.step(action)
robosuite_env.render()
I run the code to use the openvla to control the manipulator, but something goes wrong. By rights, the robotic arm should be moving down, but the action[2] that openvla outputs (i.e., the offset of the Z-axis) is always positive. Is it because the end coordinates in robosuite don’t match the end coordinates in openvla?
Hi,
I am also working on Open VLA model. May I ask what camera calibration do you use in your simulation like camera pose, camera matrix, focal length,…etc.
I’m currently working on a simulation project with the following setup:
Operating System: Ubuntu 22.04
ROS 2 Distribution: Humble
Motion Planning Framework: MoveIt 2
Simulation Environment: NVIDIA Isaac Sim 4.2.0
Programming Language: Python
I’m utilizing Isaac Sim’s built-in camera and require assistance with the following aspects of camera calibration:
Intrinsic Parameters:
Determining the camera matrix
Identifying focal length
Extrinsic Parameters:
Establishing the camera’s pose within the simulation environment
Calibration Process:
Best practices for calibrating the simulated camera to ensure accurate data representation
I aim to adjust the camera settings appropriately to enhance the fidelity of my simulation.
Dose Openvla work in you simulation environment? In fact, I don’t adjust the camera settings. Because the robosuite only has the frontview. I only set the camera_height and camera_width.
Yes, you need to trace it through and make sure all the coordinate space transforms are in the correct reference frame vs what is expected, and to fine-tune the model. There are a lot of hyperparameters to adjust and experiment with.
I would also recommend not overly focusing just on OpenVLA-7B, as there have now been several other VLA’s out, and OpenVLA is regarded as more difficult to train and is in fact larger/slower at 7B size, whereas mini-VLA’s have been coming out that are smaller (for example EVLA Efficient Vision-Language-Action Models | by Paweł Budzianowski | K-Scale Labs or OpenPi)