TensorRT model deployement on Triton Inference

Hello.
I want load multiple TensorRT models on Triton to provide inference on my Jetson Xavier NX device. I found that Triton Inference Server for Edge computing is the best option for this. My pre-processing and my post-processing code is in python.

But in this link (server/jetson.md at main · triton-inference-server/server · GitHub ) they have mentioned

  1. JetPack 5.0 does not support TensorRT using Onnx runtime.
  2. CUDA IPC shared memory is not supported.
  3. Python backend does not support GPU tensors.

My question is

  1. JetPack 5.0 does not support TensorRT using Onnx runtime.
    a. Can I use TensorRT saved models (.plan) ?
  2. CUDA IPC shared memory is not supported.
    a. What effect does this have on inference ?
    b. Would gPRC request be better in this case ?
  3. Python backend does not support GPU tensors.
    a. What does this mean?

Hi,

1. You can use the ONNX model as well as the TensorRT engine (.plan).
It is ONNXRuntime inference not to be supported.

2. CUDA IPC use for sharing buffer between processes.

3. This indicates you cannot use the tensor that uses a GPU buffer.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.