PyTorch -> ONNX -> TRT: Output Layout

I have a model in PyTorch that I generated an ONNX model file for successfully.

I’m now trying to write a C++ program to read that model using TensorRT. It is an image segmentation problem.

Provide details on the platforms you are using:
Linux distro and version Ubuntu 18.04
GPU type Lenovo P52S (P500)
nvidia driver version 418.56
CUDA version 10.0
CUDNN version 7.3.1
Python version [if using python] N/A
Tensorflow version N/A
TensorRT version 5.0.2
If Jetson, OS, hw versions

Describe the problem
The input to my network is an RGB image, that is passed to the network in [NxCxHxW] format, here N=1,C=3,H=360,W=640. The output of the network is a multi-dimensional matrix [NxCxHxW] where N=1, C=6, H=360, W=640. The final segmented class labels are found by finding the channel (for every pixel) that is maximum.

Using TensorRT, I place the output into an std::vector of dimension [13360*640]. When I try to “reshape” it into a sensible 360x640x6 volume and find the max, I don’t get reasonable looking class labels. I believe this might be due to my incorrect understanding of the way the network’s output is organized in memory. I’m happy to share my code (privately) with anyone at NVIDIA willing to take a look at it.

Thanks in advance,
Shreyas