TAO unet input and output tensor shapes and order

david9xqqb · May 4, 2022, 6:05pm

I created a custom UNET model using toolkit_version: 3.22.02, and exported to tensorRT using TAO-convert on the host computer (as opposed as the tao docker).

The model is a 5 class semantic segmentation trained on 3 channel 512x512 PNG images

The tensorRT engin was converted from TAO with:

tao-converter -k nvidia_tlt -p input_1,1x3x512x512,4x3x512x512,16x3x512x512 -t fp32 -e ./AI01.fp32_6s01.engine ./trtfp32.6s01.etlt

When I run m_engine->getBindingDimensions(0) for the input tensor dimensions I get {-1, 3, 512, 512}
and
m_engine->getBindingDimensions(1) for the output tensor dimensions I get {-1, 512, 512, 5}

And, when I run mEngine->getBindingFormat on input and output, I get kLINEAR

What is the correct order for the input tensor? channel, column, row? is it rgb or bgr?
What is in the output tensor: a) A matrix of HxW single channel with the value of the classID on each pixel? or b) 5 matrices (for each of the classes) with the probablity that each pixel is in the classID of the matrix?
What is the correct order for the output tensor? channel, column, row?

Many thanks!

Morganh · May 5, 2022, 3:52pm

There is an easy way to inspect the tensorrt engine.
$ tao-converter -k tlt_encode -p input_1,1x3x544x960,1x3x544x960,1x3x544x960 -t fp32 peoplesemsegnet.engine peoplesemsegnet.etlt
$ python -m pip install polygraphy --index-url https://pypi.ngc.nvidia.com
$ polygraphy inspect model peoplesemsegnet.engine

[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 1 Engine Input(s) ----
{input_1 [dtype=float32, shape=(1, 3, 544, 960)]}

---- 1 Engine Output(s) ----
{softmax_1 [dtype=float32, shape=(1, 544, 960, 2)]}

---- Memory ----
Device Memory: 398991360 bytes

---- 1 Profile(s) (2 Binding(s) Each) ----
- Profile: 0
    Binding Index: 0 (Input)  [Name: input_1]   | Shapes: min=(1, 3, 544, 960), opt=(1, 3, 544, 960), max=(1, 3, 544, 960)
    Binding Index: 1 (Output) [Name: softmax_1] | Shape: (1, 544, 960, 2)

---- 45 Layer(s) ----

See https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt#L29, the input tensor is RGB. And it is in CHW order.
The output sensor, for example, peoplesemsegnet, its output is the Category label (person or background) for every pixel in the input image. Outputs a semantic of people for the input image. The output order is HWC.

david9xqqb · May 5, 2022, 10:07pm

Thanks! This was very useful