UniFace ONNX Runtime GPU support on Jetson Orin Nano (JetPack 6.1 / L4T R36.4.7)

Hello,

I evaluated UniFace on Jetson Orin Nano with the following setup ,

Jetson Orin Nano , JetPack 6.1 , L4T R36.4.7, Python 3.10, CUDA 12.6 (default with JP 6.1)


Functional status (CPU)

UniFace runs successfully with onnxruntime (CPU).

I verified:

  • Face detection and recognition pipelines work correctly

  • Valid 512-D face embeddings are generated

Performance (CPU):

  • Total latency: ~500 ms per frame (~2 FPS)

  • Face detection ≈ 93% of total time

This is not sufficient for real-time use.


GPU acceleration issue

UniFace requires onnxruntime-gpu for hardware acceleration.

However:

  1. A compatible onnxruntime-gpu wheel for JetPack 6.1 (L4T R36.4.7) is not available

    • Attempted installation from Jetson AI Lab → missing wheel / 404
  2. Installed wheel:

    onnxruntime_gpu-1.23.x-cp310-linux_aarch64.whl
    
    

    but:

    ort.get_available_providers()
    
    

    returns:

    ['AzureExecutionProvider', 'CPUExecutionProvider']
    
    
  3. CUDAExecutionProvider is missing even though:

    • CUDA is working

    • torch.cuda.is_available() == True

    • GPU is visible in tegrastats

So the wheel appears to be CPU-only build.

Is there an official onnxruntime-gpu build for JetPack 6.1 (Python 3.10) that includes:

  • CUDAExecutionProvider

  • TensorRTExecutionProvider?

  1. If not available:
    • Is building ONNX Runtime from source the recommended approach for JetPack 6?
  2. Any recommended alternative inference path for ONNX models on JetPack 6 for real-time performance?

Hi,

You can find the package in the link below:

Thanks.