How to use a TAO Created tensorRT unet model in C++

What is the composition of the input memory buffer for C++ inference?

Ive trained, pruned and retrained a vgg16 unet model and tested the inferences in the unet notebook, all working fine.

Now I need to use in C++

Using the tensorRT quickstart C++ example, I am able to load the model and query dims, input and outputlayer names, etc.

All the samples seem to use the PPM format with all the channels for one pixel together.

I am reading the PNG images with OpenCV into a 3 channel Mat structure, but have no idea what the model expects to receive, as far as the order of pixels by channel, row and column; for example, with P as pixel and C as channel,

P(1,1)C1 P(1,1)C2 P(1,1)C3 P(2,1)C1 P(2,1)C2 P(2,1)C3 … P(512,512)C1 P(512,512)C2 P(512,512)C3

or

P(1,1)C1 P(2,1)C1 P(3,1)C1 P(4,1)C1 P(5,1)C1… P(512,512)C1 P(1,2)C1 P(2,2)C1 P(3,2)C1 …

Or something else?

If the model is INT8, are imput values unint_8t or float?

Thanks!!

Refer to GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
The unet output is

  • softmax_1 : A [batchSize, H, W, C] tensor containing the scores for each class

More info can be found in deepstream_tao_apps/pgie_unet_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Thanks, but my question is something else.

Deepstream reads a file or a video stream, while in C++ I need to provide an input buffer and get an output buffer,

H W and C are the dimensions. I have that.

The question is about the contents of the buffer to be built for the model,

Is it organized by row first then by column and then by channel?
Is it by channel and then by row and column?
Is it by a single row and column and all the channels, and then the next row or the next column??

I have spent a whole week playing with all these options to no success…

Thanks!

For unet, please use CHW order.
And RGB. See deepstream_tao_apps/pgie_unet_tao_config.txt at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

Yes, I used

    mEngine->getBindingFormat(input_idx)

and got

kLINEAR Row major linear format. For a tensor with dimensions {N, C, H, W} or {numbers, channels, columns, rows}, the dimensional index corresponds to {3, 2, 1, 0} and thus the order is W minor. For DLA usage, the tensor sizes are limited to C,H,W in the range [1,8192].

But still no inference in C++ Very frustrating that all the NVIDIA stuff takes so much effort to use!!! From version incompatibility to the decimal, to outdated documentation (A lot of the writeup in tensorRT quick start documentation is deprecated!) to the fact that ALL tensorRT examples are for ppm image files, and the TAO unet is ONLY PNG files.

Using a modifed version of the tensorRT quick start, this is the code that creates the tensor to be fed to the engine, reading the image file using openCV:

std::unique_ptr<float> RGBImageReader::processCV() const
{
    const int C = mDims.d[1];
    const int H = mDims.d[2];
    const int W = mDims.d[3];
    auto buffer = std::unique_ptr<float>{new float[volume()]};

    std::string image_path = "../input.png";
    cv::Mat frame = cv::imread(image_path);
    if (frame.empty())  {
        std::cerr << "Input image " << image_path << std::endl;
        assert(!"load failed");
    }

    cv::Mat channels[3];
    cv::split(frame, channels); // channels[0] Blue channels[1] Green channels[2] Red

    if (true) {
        for (int ch = 0; ch < C; ch++) {
            int K = 0;
            switch (ch) {
                case 0: K = 2; break;  // Red
                case 1: K = 1; break;  // Green
                case 2: K = 0; break;  // Blue
            }
            // for (int j = 0, HW = H * W; j < HW; ++j) {
            //     buffer.get()[ch * HW + j] = (static_cast<float>(mPPM.buffer[j * C + ch])/mPPM.max - mMean[ch]) / mStd[ch];
            //                                                                 // Red first then Green Then Blue by Rows
            // }
            for (int row = 0; row < H; row++)
            for (int col = 0, HW = H * W; col < W; col++) { 
                buffer.get()[(ch * HW) + (row * W) + col] =  (static_cast<float>(channels[K].at<uint8_t>(row, col)));
        }
        }
        
    }
    else
    {
        assert(!"Specified dimensions don't match PPM image");
    }

    return buffer;
}

To narrow down the “no inference”, can you try to generate tensorrt engine with an official Unet model (PeopleSemSegnet | NVIDIA NGC) and run inference with your code?

You can download the test file in PeopleSemSegnet | NVIDIA NGC
wget https://developer.nvidia.com/sites/default/files/akamai/NGC_Images/models/peoplenet/input_11ft45deg_000070.jpg

This is a test image and a colorized inference mask from the unet python notebook (viewed from my C++ mask colorization app, and reporting at the bottom the number of pixels per class):

This is the same image, the etlt model converted again on the local computer and running the model from an adaptation of the C++ tensorRT quickstart application:

The inference is classifying 209715 pixels as class 0 background, and 52429 pixels as class 1

Also, please note the color image of the plant to the left is reconstructed from the input buffer, to verify the integrity of the creation of the buffer.

As mentioned above, can you try to run experiment with the peoplesemsegnet model?

Downloaded the model and tao converted with

tao-converter -k tlt_encode -p input_1,1x3x544x960,1x3x544x960,1x3x544x960 -t fp16 -e ./bs1_fp16.engine ./peoplesemsegnet.etlt

In the C++ program, it loads like this:

SampleSegmentation sample("/home/david/Downloads/PeopleSemSegNet/bs1_fp16.engine");

The result is:

Can you provide your code which I can reproduce your result?

Update: For Unet, the preprocessing uses CHW order. The postprocessing uses HWC order.

That’s odd…

mEngine->getBindingFormat(output_idx);

returns

kLINEAR Row major linear format. For a tensor with dimensions {N, C, H, W} or {numbers, channels, columns, rows}, the dimensional index corresponds to {3, 2, 1, 0} and thus the order is W minor. For DLA usage, the tensor sizes are limited to C,H,W in the range [1,8192].

Changing tack and restarting with the original tensorRT quickstart, and changing that to work with openCV and the quickstart model…

Here is the source code. Its basically the tensorrt quick start example modified to also use opencv.

For a reader in the future, I DO NOT RECOMMEND YOU TRY TO LEARN ANYTHING FROM THIS.

trttest.zip (16.6 KB)

This is for peoplesemsegnet. You will get three compilation errors in places where you need to plug either the model filename or the image filename.

Thanks Morganh

UPDATE:

output_dims

Looking at the output dims of my TAO custom model, I see they are 1 channel, HxW 512x512, and then dims.d[3] as 5…

Does TAO unet adds something like an argmax layer at the output to produce per-pixel class labels of highest probability? My model has 5 classes, this means the output buffer has 5 512x512 matrices with a 1 at each pixel in the specific class?

I updated my code (still not producing inference) and include here:
trttest2.zip (15.9 KB)

This is for peoplesemsegnet. You will get three compilation errors in places where you need to plug either the model filename or the image filename.

You also need Opencv.

There are a lot code commented out for all the different things we’ve tried. My apologies for that.

Refer to TAO unet input and output tensor shapes and order - #3 by Morganh

Solved partially.

My problem was lack of clarity on the tensor shapes for both input and output layers. In fact, it took me some time to reply because all my models where rendered useless as a result of an nvidia moderator recommending to install tensorRT 8.4 EA to solve a problem to get the tensorRT quick start running Error (Could not find any implementation for node ArgMax_260.) - #6 by david9xqqb, which is impossible because of compilation errors and other unsolved issues.

I was able to get a result from the C++ model after ignoring all Nvidia documentation. On the input layer I was able to get a piece of C++ code running from here (Thank you Cyrus), which gave me clarity on the input layer from OpenCV since the order of the color layers IS important, and created an algorithm that tested multiple tensor shapes for the output.

The output shape is a vector of probabilities for each class, arranged in a matrix to correspond to each pixel in the frame.

However, the result is quite imperfect, and identical to the Deepstream result on my other post Custom TAO unet model classifying only two classes on Deepstream!

This is the output from the C++ inference:

3

This is the output of the same model in Deepstream:

Both are not useful and far from the inference under tao inference:

1

As you can see on the top right corner of the TAO inference, there is an erroneous classification, which is the third class in both the C++ and Deepstream results

To narrow down if it is an issue in tao-converter or deepstream, please try to run experiment as I mentioned in Custom TAO unet model classifying only two classes on Deepstream! - #10 by Morganh .Thanks.

1 Like

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.