Hello.
I want load multiple TensorRT models on Triton to provide inference on my Jetson Xavier NX device. I found that Triton Inference Server for Edge computing is the best option for this. My pre-processing and my post-processing code is in python.
But in this link (server/jetson.md at main · triton-inference-server/server · GitHub ) they have mentioned
- JetPack 5.0 does not support TensorRT using Onnx runtime.
- CUDA IPC shared memory is not supported.
- Python backend does not support GPU tensors.
My question is
- JetPack 5.0 does not support TensorRT using Onnx runtime.
a. Can I use TensorRT saved models (.plan) ? - CUDA IPC shared memory is not supported.
a. What effect does this have on inference ?
b. Would gPRC request be better in this case ? - Python backend does not support GPU tensors.
a. What does this mean?