Dear NVIDIA Community,
I am working on deploying the OpenVLA model on a Jetson Orin device and require assistance with TensorRT model compilation, specifically regarding cross-compilation from a powerful host system.
Problem Statement: My Jetson Orin device experiences an overcurrent shutdown during the TensorRT model compilation phase, likely due to the intensive computational and memory demands of optimizing a large model like OpenVLA. This prevents me from successfully generating a TensorRT engine directly on the target device.
Proposed Solution & Goal: To overcome this limitation, I propose leveraging a more powerful host system for the TensorRT conversion. I have access to a GPU server equipped with 8 NVIDIA L20 GPUs (x86_64 architecture). My goal is to compile the OpenVLA model (which is currently a PyTorch model) on this server such that the resulting TensorRT engine (.engine file) is optimized and fully compatible for efficient inference on the Jetson Orin (ARM64 architecture).
Specific Questions & Concerns:
-
Cross-Compilation Feasibility & Recommended Workflow:
-
Is this approach (compiling a TensorRT engine on a discrete GPU server for deployment on a Jetson Orin) officially supported and a recommended workflow by NVIDIA?
-
What are the recommended tools and steps for performing this cross-compilation? (e.g., using
trtexecwith specific flags, TensorRT Python API, or other utilities).
-
-
Compatibility Considerations (Versions & Architectures):
-
TensorRT Versions: Should the TensorRT version installed on my L20 server (for compilation) precisely match the TensorRT version available on the Jetson Orin (for inference)? Or is there an acceptable range of compatibility?
-
Architectural Differences: Given the fundamental difference between the x86_64 host (L20) and the ARM64 target (Orin), are there specific TensorRT builder flags, target GPU architecture specifications (
--device=DLA_0if applicable, or more fundamentally,sm_XXversion) or explicit platform declarations needed during the compilation process on the server to ensure the engine is built for the Orin’s specific hardware? -
OpenVLA Specifics: Are there common pitfalls or special considerations when converting large Vision-Language Models (like OpenVLA) in a cross-compilation scenario?
-
Context: This strategy would enable us to leverage the significant computational resources of the GPU server for the heavy lifting of model optimization and engine generation, while still achieving efficient, low-power inference on the Jetson Orin for edge deployment, without hitting power limitations during development.
Any guidance, best practices, recommended workflows, or warnings about potential compatibility issues would be greatly appreciated. If any further information about my setup (e.g., specific Jetson Orin SKU, exact TensorRT versions I’m planning to use, an anonymized model graph) would be helpful, please let me know.
Thank you in advance for your time and assistance.
Best regards,