TensorRT Cross-Compilation: OpenVLA Model (PyTorch) for Jetson Orin, Converting on GPU Server (8x L20)

Dear NVIDIA Community,

I am working on deploying the OpenVLA model on a Jetson Orin device and require assistance with TensorRT model compilation, specifically regarding cross-compilation from a powerful host system.

Problem Statement: My Jetson Orin device experiences an overcurrent shutdown during the TensorRT model compilation phase, likely due to the intensive computational and memory demands of optimizing a large model like OpenVLA. This prevents me from successfully generating a TensorRT engine directly on the target device.

Proposed Solution & Goal: To overcome this limitation, I propose leveraging a more powerful host system for the TensorRT conversion. I have access to a GPU server equipped with 8 NVIDIA L20 GPUs (x86_64 architecture). My goal is to compile the OpenVLA model (which is currently a PyTorch model) on this server such that the resulting TensorRT engine (.engine file) is optimized and fully compatible for efficient inference on the Jetson Orin (ARM64 architecture).

Specific Questions & Concerns:

  1. Cross-Compilation Feasibility & Recommended Workflow:

    • Is this approach (compiling a TensorRT engine on a discrete GPU server for deployment on a Jetson Orin) officially supported and a recommended workflow by NVIDIA?

    • What are the recommended tools and steps for performing this cross-compilation? (e.g., using trtexec with specific flags, TensorRT Python API, or other utilities).

  2. Compatibility Considerations (Versions & Architectures):

    • TensorRT Versions: Should the TensorRT version installed on my L20 server (for compilation) precisely match the TensorRT version available on the Jetson Orin (for inference)? Or is there an acceptable range of compatibility?

    • Architectural Differences: Given the fundamental difference between the x86_64 host (L20) and the ARM64 target (Orin), are there specific TensorRT builder flags, target GPU architecture specifications (--device=DLA_0 if applicable, or more fundamentally, sm_XX version) or explicit platform declarations needed during the compilation process on the server to ensure the engine is built for the Orin’s specific hardware?

    • OpenVLA Specifics: Are there common pitfalls or special considerations when converting large Vision-Language Models (like OpenVLA) in a cross-compilation scenario?

Context: This strategy would enable us to leverage the significant computational resources of the GPU server for the heavy lifting of model optimization and engine generation, while still achieving efficient, low-power inference on the Jetson Orin for edge deployment, without hitting power limitations during development.

Any guidance, best practices, recommended workflows, or warnings about potential compatibility issues would be greatly appreciated. If any further information about my setup (e.g., specific Jetson Orin SKU, exact TensorRT versions I’m planning to use, an anonymized model graph) would be helpful, please let me know.

Thank you in advance for your time and assistance.

Best regards,

Hi,

Since TensorRT optimizes the engine based on hardware architecture, you will need to convert it on the target directly.
To prevent a shutdown, you can try to lower the clocks (either nvpmodel or custom clocks) to reduce power consumption.

Thanks.

I tried to lower the clock frequency by setting the Orin to MODE_15W, but it still shuts down automatically during the mode switch. What should I do?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.