Preparing Grayscale images to feed tao unet model exported to tensorRT


A clear and concise description of the bug or issue.


TensorRT Version: 8.0.1-1+cuda11.3
GPU Type: RTX 3090
Nvidia Driver Version: NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6
Operating System + Version: Ubunto 20.04

Steps To Reproduce

I exported a tao unet binary semantic segmentation model to work with tensorRT with input images 1280X704 Grayscale.

Before loading the image frames to the CUDA buffer to feed the model inference engine, I am normalizing the frame as follows:

Mat image(Size(w, h), CV_8UC3, (void*)video.get_data(), Mat::AUTO_STEP);
cv::Mat imageGray;
cvtColor(image, imageGray, CV_BGR2GRAY);
cv::resize(imageGray, imageGray, cv::Size(1280,704));

cv::subtract(image, cv::Scalar(127.5f, 127.5f, 127.5f), imageGray, cv::noArray(), -1);
cv::divide  (image, cv::Scalar(127.5f, 127.5f, 127.5f), imageGray, 1, -1);

And that is loaded into the host data buffer. But the model is not producing any good segmentation classes…

Is it the normalization of the image? Is it the proper order of loading the CV::Mat object to the host data buffer?

I have another multiclass semantic segmentation fed colog images, that is working well with a similar architecture,

Many thanks!!


We are moving this post to the TAO toolkit forum to get better help.

Thank you.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one.

Please refer to
UNET — TAO Toolkit 3.22.05 documentation and
Data Annotation Format — TAO Toolkit 3.22.05 documentation

Color Input Image Type
For the color input images, each mask image is a single-channel or three-channel image with size equal to the input image. Every pixel in the mask should have an integer value that represents the segmentation class label_id, as per the mapping provided in the dataset_config. Ensure that the value of the pixels in the mask image are within the range of the label_id values provided in the dataset_config.

For a reference example, refer to the _labelIds.png images format in the Cityscapes Dataset.

Grayscale Input Image Type

For grayscale input images, the mask is a single channel image with size equal to the input image. Every pixel has a value of 255 or 0, which corresponds respectively to a label_id of 1 or 0 in the dataset_config. For reference, refer to ISBI dataset Jupyter notebook example provided in ngcresources.