Deploy SigLip Model on Jetson Orin 32GB on DLA

Subject:​​ Deployment Inquiry: SigLip Vision Model on Jetson Orin 32GB with DLA

Hi all,

Apologies for the interruption. I’m working on deploying the vision component of the ​​SigLip model​​ to the ​​DLA​​ on a Jetson Orin 32GB device.

​Target​

Deploy the vision segment of SigLip on Jetson Orin 32GB using DLA.

​Issue​

Most layers default to the GPU, with the log reporting:

"Unsupported on DLA. Switching this layer's device type to GPU."

​Layers affected​​:

SHUFFLE, CONSTANT, CAST, GATHER, MATRIX_MULTIPLY, CONVOLUTION, ELEMENTWISE, REDUCE, UNARY, SLICE.

(Note: I’ve converted Linearlayers to 1x1 convolutions to avoid GPU fallback for those.)

​Key question​​: How can I maximize compute on DLA and resolve these fallbacks?

​Request for Guidance​

I’d appreciate suggestions on deploying SigLip’s vision model to DLA, specifically:

  1. Should I rewrite the forward function using DLA-supported operators? Are there alternative approaches?
  2. Are there documents for deploying SigLip on DLA? Existing materials only cover ResNet, MobileNet, etc.

​Supporting Materials​

  1. ​Log​​: Details deployment strategy and layer fallbacks.
  2. ​Code​​: For reproducibility.

​Reproduction Command​​:
python comprehensive_dla_accuracy_validation.py
Results will be siglip_1layer_linear_only_dla_build.log

Thank you for your support!

Best regards
Material.zip (94.6 KB)

*** Please note that this reply is generated by LLM automatically ***

To address the inquiry about deploying the SigLip vision model on a Jetson Orin 32GB device with DLA (Deep Learning Accelerator), we need to consider several aspects, including maximizing compute on DLA, resolving GPU fallbacks, and understanding the deployment strategy for the model. The goal is to efficiently utilize the DLA for the vision segment of the SigLip model, minimizing reliance on the GPU for computations.

Understanding DLA and GPU Fallbacks

DLA is a hardware accelerator designed for deep learning workloads, offering significant performance improvements for supported operations. However, not all layers or operations in a neural network model are compatible with DLA. When a layer cannot be executed on the DLA, it falls back to the GPU, which can lead to performance degradation. The conversion of layers to 1x1 convolutions to avoid GPU fallback is a step in the right direction but may not be sufficient for all layers or may introduce additional computational overhead.

Maximizing Compute on DLA

To maximize the use of DLA for the SigLip vision model:

  1. Layer Compatibility: Identify which layers in the SigLip model are compatible with DLA. NVIDIA provides documentation and tools to help determine layer compatibility.
  2. Model Optimization: Optimize the model to use DLA-supported operators. This might involve rewriting parts of the model or using model optimization tools provided by NVIDIA.
  3. TensorRT: Utilize TensorRT, NVIDIA’s inference optimizer, which can help in optimizing the model for DLA execution. TensorRT can automatically optimize the model for better performance on NVIDIA hardware, including DLAs.
  4. DLA-Supported Operators: Focus on using operators that are natively supported by DLA. This might require modifying the forward function of the model to better align with DLA capabilities.

Alternative Approaches

  • Model Pruning: Prune the model to reduce complexity and potentially make more layers compatible with DLA.
  • Knowledge Distillation: Use knowledge distillation to create a smaller, DLA-friendly model that mimics the behavior of the original model.
  • Quantization: Apply quantization techniques to reduce the precision of model weights and activations, which can make the model more DLA-friendly and improve inference speed.

Documentation and Deployment Strategy

While there might not be specific documentation for deploying SigLip on DLA, NVIDIA provides extensive resources for deploying various models (like ResNet, MobileNet) on Jetson devices with DLA. These resources can serve as a starting point:

  • NVIDIA Jetson Documentation: Offers guides on optimizing models for DLA and deploying them on Jetson devices.
  • TensorRT Documentation: Provides detailed information on optimizing and deploying models with TensorRT.
  • NVIDIA Deep Learning SDK: Includes tools and libraries for developing and deploying AI applications on NVIDIA hardware, including DLAs.

Reproducibility and Results

For reproducibility, it’s essential to document the exact steps taken to optimize and deploy the SigLip model on the Jetson Orin 32GB with DLA. This includes:

  • Model Modifications: Detail any changes made to the model to support DLA.
  • Optimization Techniques: List the optimization techniques used (e.g., TensorRT, model pruning, quantization).
  • Performance Metrics: Report on the performance metrics (e.g., inference speed, accuracy) before and after optimization to quantify the improvements.

By following these steps and leveraging NVIDIA’s resources and tools, you can effectively deploy the SigLip vision model on a Jetson Orin 32GB device with DLA, maximizing the use of the DLA for improved performance and efficiency.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

Which JetPack version do you use?
It’s recommended to upgrade to the latest since we have added some support in JetPack 6.

Unfortunately, we don’t have a tutorial for SigLip on DLA.
But in general, you can follow the below repo to implement your use case:

For a non-supported layer, you can either:

  1. Modify the layer:

Or
2. Reformat the layer: (ex. ReduceMean → AveragePool)

Thanks.

Thank you for your reply! My JetPack version is 6.2. I followed the material you provided previously, but encountered compatibility issues with several layers.

So, based on this, it seems there is currently no scheduled release for deploying SigLip on the Jetson Orin DLA. If I wish to proceed with deployment, would I need to manually adapt those unsupported layers?

Additionally, my TensorRT version is 10.3. Would the guidance in the following link still be applicable/correct for this version? https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/work-with-dla.html#dla-supported-layers-and-restrictions

Best wishes,

Hi,

DLA deploys models in a layer-based manner.
Do you know which layer is not working on the DLA?

You can find the detailed support matrix for the DLA operator in the link below:

The document is overall similar but you can find one for TensorRT 10.3 in the link below:

Thanks.

1 Like

Thank you. I will follow these documents.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.