TensorRT output format


I’m using TensorRT 4 python API to run a Caffe FCN for semantic segmentation. This FCN gets an image and produces N (in my case 3) segmentation maps of the same input size.

In my case I fix the input image to 512x512. So in my prototxt I have:

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    shape { dim: 1 dim: 3 dim: 512 dim: 512 }

In my python TensorRT code I preprocess the image to have simensions CXHxW and BGR ordered color channels.

im = im.resize((512, 512), Image.ANTIALIAS)
im = np.array(im, dtype=np.float32)
im = im[:, :, ::-1]
im = im.transpose((2, 0, 1))

I create the input to the net as:

d_input = cuda.mem_alloc(1 * im.size * im.dtype.itemsize)
cuda.memcpy_htod_async(d_input, np.ascontiguousarray(im), stream)

And get the output as:

d_output = cuda.mem_alloc(1 * im.size * output.dtype.itemsize)
cuda.memcpy_dtoh_async(output, d_output, stream)

To create the segmentation maps from the resulting 1d array I have tried:

output_3d = output.reshape((3,512,512)).transpose((1,2,0))

But I cannot get the correct segmentation maps, I only get noise which some pattern I cannot understand.

How is the output of TensorRT flattened when outputs of Caffemodel al 3d?


Really late to this one, but I’m experiencing a similar problem. I have no idea how to reshape the output of the TensorRT engine for an autoencoder converted from Keras --> ONNX --> TensorRT. Using “C” ordering (last dimension changing fastest) and “F” ordering (first dimension changing fastest) both produce garbage. Of course, the network could be garbage in the first place, but I don’t believe that’s the cause.

I would assume that the TensorRT engine outputs the vector in NCHW ordering (as this is generally referenced as standard/required throughout the documentation), but the ordering of the flattening operation at output is entirely unclear to me. If the OP was able to find a solution, an update would be much appreciated. If a mod can speak to this problem, that would also be appreciated.