Hi,
I’m trying to use the outputted engine of the TAO training on an edge device. That means I am no longer working in the TAO framework, but I need to ensure I’m feeding inputs into the network the same way the training was done. How do I figure this out?
For example, in the Mask-RCNN training, there is a setting for data_config.image_size
which the documentation simply describes as “indicates the dimension of the resized and padded input”. This size is actually set as the hardcoded input size to the network when running the .engine file, so whatever I feed it has to conform.
Does the scaling preserve aspect ratio? How does a 1920x1080 image get scaled and padded? What are the padded values? (Black? or Zeros after image normalization?)
Similarly, how do I know what image normalization values to use? I’m using some typical values, which seem to work, but it would be good to actually verify.
I’m asking both
a) what are the answers to these questions, and
b) where can I find the source code that actually does these manipulations so I can reverse-engineer them?
Is the code even available? It’s a complex mess with some things handled at the tao host layer and most of the code inside the (undocumented) container.
All of this is aimed at checking the ACTUAL latency of the network on a jetson device - time from image taken to time the inference is available on the GPU. Preprocessing is not an option in this case.