Triton Inference Server Inference Request Error on GPU

Hello,

I am using Jetson Nano 4GB with Jetpack 4.6.

I need to deploy my deeplab semantic segmantation model to Triton Inference Server and then send an inference request to the deployed model.

Below you can see my Triton Inference Server configuration file for semantic segmentation model and information about deployment process of the segmentation model.

name: “segmentation”
platform: “onnxruntime_onnx”
max_batch_size : 0
input [
{
name: “ImageTensor:0”
data_type: TYPE_UINT8
dims: [1,1000,1000,3 ]
}
]
output [
{
name: “SemanticPredictions:0”
data_type: TYPE_INT32
dims: [1,-1,-1]
}
]


When I try to send an inference request (by using http client script) to the deployed segmentation model, I got some errors and then my inference request script is stopped. Below I am sharing the errors and resources usage of Nano during this process.


If I change the config file (below) of the model and deploy it over the CPU, the same inference request code works without errors.

name: “segmentation”
platform: “onnxruntime_onnx”
max_batch_size : 0
input [
{
name: “ImageTensor:0”
data_type: TYPE_UINT8
dims: [1,1000,1000,3 ]
}
]
output [
{
name: “SemanticPredictions:0”
data_type: TYPE_INT32
dims: [1,-1,-1]
}
]

instance_group [
{
count: 1
kind: KIND_CPU
}
]

What is the reason of this problem?

Why can not I run the inference code on the GPU, while I can run it on the CPU, ?

What do you suggest me to solve this problem?

Is this problem due to the power of hardware of the Jetson Nano? For example, If I use Xavier, will I encounter the same problem?

Thanks

Please re-post your question on: Triton Inference Server · GitHub , the NVIDIA and other teams will be able to help you there.
Sorry for the inconvenience, thanks for your patience.