How does TAO prepare input?

Hi,
I’m trying to use the outputted engine of the TAO training on an edge device. That means I am no longer working in the TAO framework, but I need to ensure I’m feeding inputs into the network the same way the training was done. How do I figure this out?

For example, in the Mask-RCNN training, there is a setting for data_config.image_size which the documentation simply describes as “indicates the dimension of the resized and padded input”. This size is actually set as the hardcoded input size to the network when running the .engine file, so whatever I feed it has to conform.

Does the scaling preserve aspect ratio? How does a 1920x1080 image get scaled and padded? What are the padded values? (Black? or Zeros after image normalization?)

Similarly, how do I know what image normalization values to use? I’m using some typical values, which seem to work, but it would be good to actually verify.

I’m asking both
a) what are the answers to these questions, and
b) where can I find the source code that actually does these manipulations so I can reverse-engineer them?

Is the code even available? It’s a complex mess with some things handled at the tao host layer and most of the code inside the (undocumented) container.

All of this is aimed at checking the ACTUAL latency of the network on a jetson device - time from image taken to time the inference is available on the GPU. Preprocessing is not an option in this case.

Actually as mentioned previously, for running inference against the Mask_rcnn tensorrt engine, please try to refer to peoplesegnet in GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps: Sample app code for deploying TAO Toolkit trained models to Triton. The postprocessing code can be found in tao-toolkit-triton-apps/configuring_the_client.md at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub
For preprocessing , please refer to tao-toolkit-triton-apps/frame.py at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub.
Peoplesegnet is a purpose-built model which is trained on Mask_rcnn network. So we can leverage it.

For “All of this is aimed at checking the ACTUAL latency of the network on a jetson device” , usually the fps is checked by using /usr/src/tensorrt/bin/trtexec .
In Jetson device, for example,
/usr/src/tensorrt/bin/trtexec --loadEngine=your_makrcnn_tensorrt.engine --fp16 --batch=1 --useSpinWait --avgRuns=1000

Then check the “GPU Compute Time” in the log.

Thanks, I’ll look at those!

The benchmark you mention seems to give similar results to the one I was using - thanks for that.