Jetson-inference - running a custom semantic segmentation model

Hello!

I’m trying to run my custom ResNet based model with the help of jetson-inference. The model is trained in PyTorch 1.7.0, and then exported to ONNX opset version 11. I’m able to benchmark it with trtexec, but when using ./segnet-console or ./segnet-console.py with appropriate arguments for model, input_blob, output_blob, labels, and colors, I’m getting the following error:

[TRT]    binding to input 0 image.1  binding index:  0
[TRT]    binding to input 0 image.1  dims (b=1 c=3 h=1024 w=2048) size=25165824
[TRT]    binding to output 0 391  binding index:  8
[TRT]    binding to output 0 391  dims (b=1 c=12 h=1024 w=2048) size=100663296
[TRT]
[TRT]    device GPU, /home/user/models/file_opset11_2048x1024.onnx initialized.
[TRT]    segNet outputs -- s_w 2048  s_h 1024  s_c 12
[image] loaded 'images/warehouse.jpg'  (2048x1024, 3 channels)
[TRT]    ../rtSafe/cuda/cudaConvolutionRunner.cpp (457) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
[TRT]    FAILED_EXECUTION: std::exception
[TRT]    failed to execute TensorRT context on device GPU
segnet:  failed to process segmentation
[image] imageLoader -- End of Stream (EOS) has been reached, stream has been closed
segnet:  shutting down...
[cuda]      an illegal memory access was encountered (error 700) (hex 0x2BC)
[cuda]      /home/user/dev/jetson-inference/utils/image/imageLoader.cpp:105
[TRT]    ../rtSafe/safeRuntime.cpp (32) - Cuda Error in free: 700 (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::CudaError'
  what():  std::exception
[1]    18973 abort (core dumped)  ./segnet-console --model=/home/user/models/file_opset11_2048x1024.onnx

Could you please advise me on how to successfully solve this issue? Thanks!

Hi @f.kosec, did you run trtexec with --fp16 flag to make sure your custom model can run with FP16 enabled?

You should also check that the pre-processing that I do (for my FCN-ResNet18 models) is the same that your PyTorch model does:

Also, it appears your model is internally performing the deconvolution (because the output dims == input dims), so when you process the overlay you want to use point filtering (i.e. run segnet program with --filter-mode=point flag). This is so it doesn’t needlessly perform bilinear interpolation on the output. I remove the deconv layer from my FCN-ResNet models because it’s linear and faster in my bilinear interpolation kernel.

Hi again, and sorry for my late response.

The problem was the wrong conversion from PyTorch to ONNX.

Thanks for your additional suggestion about the deconvolution, I’ll check that out as well.

@dusty_nv could you please have a look - what might be happening here, and what I seem to be missing / doing wrong?

When running segnet-console.py like this, everything works:

./segnet-console.py --network=FCN-ResNet18-Cityscapes-2048x1024 /home/user/data/input.png /home/user/data/out.png

However, when I try to run the built engine manually like this:

./segnet-console.py --model=/home/user/development/jetson-inference/data/networks/FCN-ResNet18-Cityscapes-2048x1024/fcn_resnet18.onnx.1.1.7103.GPU.FP16.engine --labels=/home/user/development/jetson-inference/data/networks/FCN-ResNet18-Cityscapes-2048x1024/classes.txt --colors=/home/user/development/jetson-inference/data/networks/FCN-ResNet18-Cityscapes-2048x1024/colors.txt --input_blob=input_0 --output_blob=output_0 /home/user/data/input.png /home/user/data/out.png

… the output image is wrong, and I can see interesting differences in the console, namely c, h, and w are wrong:

[TRT]    binding to input 0 input_0  dims (b=1 c=1 h=3 w=1024) size=12288
[TRT]    binding to output 0 output_0  dims (b=1 c=1 h=21 w=32) size=2688

Attached are the full console logs.

Also, when I try trtexec with --fp16, it passes, but one of the line says: “[I] Precision: FP32+FP16” and “[I] Inputs format: fp32:CHW, [I] Outputs format: fp32:CHW”

model.txt (8.2 KB) network.txt (8.0 KB)

Have you tried just specifying the ONNX model to the --model argument? It will select the .engine automatically if it exists, otherwise it will build it. You can put your custom model in a new directory.