Triton Inference Server Inference Request Error on GPU

Hello,

I am using Jetson Nano 4GB with Jetpack 4.6.

I need to deploy my deeplab semantic segmantation model to Triton Inference Server and then send an inference request to the deployed model.

Below you can see my Triton Inference Server configuration file for semantic segmentation model and information about deployment process of the segmentation model.

name: “segmentation”
platform: “onnxruntime_onnx”
max_batch_size : 0
input [
{
name: “ImageTensor:0”
data_type: TYPE_UINT8
dims: [1,1000,1000,3 ]
}
]
output [
{
name: “SemanticPredictions:0”
data_type: TYPE_INT32
dims: [1,-1,-1]
}
]

When I try to send an inference request (by using http client script) to the deployed segmentation model, I got some errors and then my inference request script is stopped. Below I am sharing the errors and resources usage of Nano during this process.

If I change the config file (below) of the model and deploy it over the CPU, the same inference request code works without errors.

name: “segmentation”
platform: “onnxruntime_onnx”
max_batch_size : 0
input [
{
name: “ImageTensor:0”
data_type: TYPE_UINT8
dims: [1,1000,1000,3 ]
}
]
output [
{
name: “SemanticPredictions:0”
data_type: TYPE_INT32
dims: [1,-1,-1]
}
]

instance_group [
{
count: 1
kind: KIND_CPU
}
]

What is the reason of this problem?

Why can not I run the inference code on the GPU, while I can run it on the CPU, ?

What do you suggest me to solve this problem?

Is this problem due to the power of hardware of the Jetson Nano? For example, If I use Xavier, will I encounter the same problem?

Thanks

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

It seem that you are using a INT32 model.

Nano doesn’t support INT8/INT32 CUDA operation.
Could you try a float type model to see if the same issue occurs?

Thanks.