Custom TAO unet model classifying only two classes on Deepstream!

I solved the directory concatenation issue.

Please read my previous post which is more relevant.

If possible, could you share the .etlt model, key for reproducing? We will check further internally.

KEY=nvidia_tlt

etlt after pruning and retraining:

etlt exported from within the notebook with:


!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
               -k $KEY \
               -e $SPECS_DIR/unet_retrain_vgg_6S.txt \
               -o $USER_EXPERIMENT_DIR/export/tao.fp32_6s01.etlt \
               --data_type fp32 \
               --engine_file $USER_EXPERIMENT_DIR/export/tao.fp32_6s01.engine \
               --max_batch_size 3

tao.fp32_6s01.etlt (61.4 MB)

test image in jpeg and png formats:

Thanks for the help.

After checking, I find that during training the RGB is converted to BGR as pre-processing. So the setting should be BGR for inference on Deepstream or standalone inference.

So, please set below in config file.
model-color-format=1

We will update the DS config file as well.

The result is as below.

Thanks!! That’s substantial.

I rerun the experiments and still have a performance issue.

The same exact engine file in C++ has lower performance than deepstream.

Here is the comparison (on the left is deepstream):

A question I asked before, is it possible to get from deepstream the raw inference output without the overlay?

The pixel count per class in C++ is 0: 254390, 1: 5766, 2: 169, 3: 673, 4: 0. In C++ class 4 has 0 pixels, while I see some pixels that should be class 4 in the deepstream image but colors are not good enough to distinguish (the base of the stem). Having the raw inference mask may help in understanding.

Independent of that, I’d like to know the normalization details tao uses in the preprocessing, which I think is the source of the discrepancy.

Almost there… Very exited…

Many thanks!

Could you please share c++ inference code for reproducing? We will check further internally.

Will check futher.

Will share you later.

It will take some time since I need to remove some parts and I am afraid of breaking it. At any rate, I think that the normalization details used by tao unet is what I need to close this issue. Please send.

Thanks!

For normalization, please follow https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt#L25
and
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt#L30

It means that (x - 127.5 ) /127.5

Don’t understand.

Are you saying that each pixel value R G or B is subtracted 127.5 and then divided by the same number? This is a gross estimation of normalization.

I tried that in my code like this:

cv::subtract(image, cv::Scalar(127.5f, 127.5f, 127.5f), image, cv::noArray(), -1);
cv::divide(image, cv::Scalar(127.5f, 127.5f, 127.5f), image, 1, -1);

And the result is much worse

I played with some numbers and got a much better result:

While that last result points to the fact that getting normalization right will end this issue, that result is just a guessing game and I’d like to have the correct code.

It’s been a very long time since I took statistics but if I am not mistaken normalization for Z Score is

Normalized_Pixel = (Pixel_Value - Mean_of_Pixels_in_Channel) / STD_Pixels_in_Channel

To illustrate, from TensorRT’s own quick start example:

buffer.get()[c * HW + j] = (static_cast<float>(mPPM.buffer[j * C + c])/max - mMean[c]) / mStd[c];

Or

Normalization for scaling to a range:

Normalized_Pixel = (Pixel_Value - Mean_of_Pixels_in_Channel) / (Max_of_Pixels_in_Channel - Min_of_Pixels_in_Channel)

So I still think that to get to the solution, I need to know in tao unet, what are the values of mean, min max and stdev used in normalization in each channel used for preprocessing images during training?

Does it preprocess all training images to get those values? Does it use the values of the base model? in my case vgg16?

From ds config file,

net-scale-factor=0.007843
offsets=127.5;127.5;127.5

For Unet, if input is RGB, then it will convert to BGR. Then, each pixel value will divide 127.5 and then substract 1. After that, convert HWC to CHW.
In above normalization step,
x/127.5 - 1
equals to (x - 127.5) /127.5 , and also
equals to (x - 127.5) * 0.007843 which implies (x - offsets) * net-scale-factor, mentioned in ds config file.

Using your suggested normalization we still have a difference between deepstream and C++

cv::subtract(image, cv::Scalar(127.5f, 127.5f, 127.5f), image, cv::noArray(), -1);
cv::divide  (image, cv::Scalar(127.5f, 127.5f, 127.5f), image, 1, -1);

In this picture the left is output from deepstream, and the right is Cpp

As you can see the cpp is not performing as well with the exact same engine file.

But if I only do

cv::subtract(image, cv::Scalar(127.5f, 127.5f, 127.5f), image, cv::noArray(), -1);

I get this , which is also underperforming because of the false positives.

So the question remains how to to achieve identical results in C++ and deepstream.

Sent you the code by direct message.
Many thanks,

David

Please modify threshold to

float threshold = 0.3;

The result is as below.

Bingo!

@Morganh thank you so much for your patience and support!

With this solved, we move on with the project!

With my thanks and appreciation,

David

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.