Normalizing objects (yolo output) to be processed for the secondary classifier

ayanasser · August 4, 2021, 3:27pm

I am trying to make a pre-processing on the objects that came out from yolo,
At my original model i used this function for normalizing

transforms = T.Compose([
    T.Resize(size=(288, 144)),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

So the means are : [0.485, 0.456, 0.406]
the stds are: [0.229, 0.224, 0.225]
and this compose function work as this equation : (x.sub(mean).div(std))

When I try to make this at deepstream I found that the offset means “the mean”:
offsets=0.485;0.456;0.406

Then the deepstream work with this equation (X-offset)* net-scale-factor
so the net-scale-factor now is equal to (1/std)
so I made it like that : 1/0.255 = 4.44, net-scale-factor=4.44

The model’s output is totally wrong and different from the original model, so am wondering if there any step i do wrong ??

• Hardware Platform (GPU)
• DeepStream Version: 5.0
• TensorRT Version: 7.0.0.11

ayanasser · August 5, 2021, 7:24am

@Fiona.Chen @mchi Could you check this please, it’s urgent ^^

mchi · August 5, 2021, 7:33am

================================
In nvinfer

y = net_scale_factor*(x-mean)

Where:

x is the input pixel value. It is an int8 with range [0,255].
mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.
net-scale-factor is the pixel scaling factor specified in the configuration file. It is a float.
y is the corresponding output pixel value. It is a float.

==============================

Normally, net_scale_factor equals = 1/255 , so the y is float from (-1, 1).

mchi · August 5, 2021, 7:58am

in such case, what the data range of this x, is it (0,255] or data from (0,255] drided 255 ?

ayanasser · August 5, 2021, 8:12am

I’ll give you a sample of the image before and after the transformation with the original code:
The transform function:
transforms = T.Compose([
** T.Resize(size=(288, 144)),**
** T.ToTensor(),**
** T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])**
])

image before transform (before resize or normalize) :
[[[111 99 91]
[112 100 92]
[111 99 91]
…
[104 92 78]
[105 90 81]
[105 90 83]]

[[111 98 92]
[111 99 92]
[110 98 91]
…
[104 92 78]
[104 91 82]
[104 90 83]]

[[109 97 93]
[110 97 93]
[109 97 92]
…
[104 90 82]
[103 89 84]
[102 89 85]]

…

[[174 153 139]
[ 41 21 3]
[168 149 130]
…
[ 70 48 36]
[ 72 50 38]
[ 74 52 40]]

[[135 117 101]
[ 50 30 15]
[165 146 127]
…
[ 69 47 35]
[ 71 49 37]
[ 73 52 39]]

[[108 89 73]
[ 61 41 23]
[163 144 125]
…
[ 70 48 36]
[ 72 50 38]
[ 73 51 39]]]

image After transform (with resize and normalize):
tensor([[[-0.5596, -0.5596, -0.5424, …, -0.7137, -0.6965, -0.6965],
[-0.5424, -0.5424, -0.5424, …, -0.7137, -0.6965, -0.6965],
[-0.5253, -0.5253, -0.5253, …, -0.6965, -0.6794, -0.6794],
…,
[ 0.0056, -0.2856, -1.1760, …, -1.4500, -1.4329, -1.4329],
[-0.4911, -0.6965, -1.2959, …, -1.4672, -1.4500, -1.4500],
[-0.8678, -1.0048, -1.3815, …, -1.4672, -1.4500, -1.4500]],

    [[-0.3025, -0.3025, -0.2850,  ..., -0.4601, -0.4601, -0.4601],
     [-0.3200, -0.3200, -0.3025,  ..., -0.4426, -0.4601, -0.4601],
     [-0.3375, -0.3375, -0.3200,  ..., -0.4601, -0.4776, -0.4776],
     ...,
     [ 0.3978,  0.0826, -0.7927,  ..., -1.1429, -1.1253, -1.1253],
     [-0.0924, -0.3200, -0.9153,  ..., -1.1604, -1.1253, -1.1253],
     [-0.4776, -0.6001, -0.9678,  ..., -1.1604, -1.1429, -1.1429]],

    [[ 0.1302,  0.1302,  0.1476,  ...,  0.0256,  0.0256,  0.0256],
     [ 0.1302,  0.1302,  0.1302,  ...,  0.0082,  0.0082,  0.0082],
     [ 0.1128,  0.1128,  0.1128,  ..., -0.0092, -0.0092, -0.0092],
     ...,
     [ 0.9668,  0.6531, -0.2184,  ..., -0.5321, -0.5147, -0.5147],
     [ 0.4439,  0.2348, -0.3404,  ..., -0.5495, -0.5321, -0.5321],
     [ 0.0779, -0.0441, -0.4101,  ..., -0.5495, -0.5321, -0.5321]]])

ayanasser · August 8, 2021, 6:21am

@mchi Any updates ?
OR Is there any difference between deepstream frames and opencv frames ?

mchi · August 10, 2021, 3:35pm

Hi @ayanasser
Sorry for delay!

Looking the first number: 111 and: -0.5596, how can they calculate by the equation - (x.sub(mean).div(std)) ?

ayanasser · August 11, 2021, 9:22am

I think the reason is the original code transform the image to tesnor, and here’s the documentation for the tensor:
Converts a PIL Image or numpy.ndarray (H x W x C) in the range
[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

So how can I mimic this process inside deep stream??
"Make it channel first then convert the pixel values inside this range [0, 1],
I think all I need is to divide the pixel value by 255 before sub the mean (offset) from it, and make it channel first
Do you have any clue ?

ayanasser · August 11, 2021, 9:24am

Also this example was not totally right because it contains resize, so we are not comparing the same pixel values,
here’s another good example:

crop_img_orig [[[ 89 73 74]
[ 92 80 80]
[ 77 69 69]
…
[ 15 11 8]
[ 31 24 21]
[ 48 41 38]]

[[113 97 98]
[ 85 73 72]
[ 88 80 80]
…
[ 47 43 39]
[ 37 30 27]
[ 39 32 28]]

[[124 108 106]
[119 103 101]
[ 79 68 65]
…
[102 98 94]
[ 30 23 20]
[ 14 6 4]]

…

[[ 43 34 36]
[ 36 27 29]
[ 50 42 44]
…
[202 197 206]
[189 184 194]
[201 196 206]]

[[ 47 41 42]
[ 50 43 45]
[ 45 38 40]
…
[201 198 209]
[204 201 211]
[197 194 204]]

[[ 43 38 39]
[ 47 42 43]
[ 40 35 36]
…
[199 198 209]
[200 199 209]
[198 197 207]]]
crop_img_orig shape: (82, 43, 3)
Crop image after transform: tensor([[[-0.8507, -0.7479, -0.9363, …, -1.9809, -1.7583, -1.4672],
[-0.4397, -0.8849, -0.7479, …, -1.4500, -1.6555, -1.6384],
[-0.3027, -0.3883, -1.0048, …, -0.5082, -1.7754, -2.0494],
…,
[-1.5014, -1.6213, -1.3644, …, 1.4098, 1.2043, 1.4098],
[-1.3987, -1.3473, -1.4329, …, 1.4612, 1.4954, 1.3755],
[-1.4500, -1.3815, -1.5014, …, 1.4612, 1.4612, 1.4269]],

    [[-0.7577, -0.6352, -0.8277,  ..., -1.8431, -1.6155, -1.3179],
     [-0.3375, -0.7577, -0.6352,  ..., -1.2829, -1.5105, -1.4755],
     [-0.1450, -0.2325, -0.8452,  ..., -0.3200, -1.6331, -1.9307],
     ...,
     [-1.4405, -1.5630, -1.3004,  ...,  1.4132,  1.1856,  1.3957],
     [-1.3179, -1.2829, -1.3704,  ...,  1.4307,  1.4832,  1.3606],
     [-1.3704, -1.3004, -1.4230,  ...,  1.4307,  1.4482,  1.4132]],

    [[-0.2532, -0.2010, -0.4624,  ..., -1.5430, -1.2641, -0.9678],
     [ 0.1651, -0.3230, -0.2707,  ..., -0.9853, -1.1596, -1.1247],
     [ 0.3568,  0.2696, -0.4275,  ..., -0.0267, -1.2816, -1.5604],
     ...,
     [-1.0550, -1.1770, -0.9330,  ...,  1.7163,  1.4897,  1.6988],
     [-0.9853, -0.9330, -1.0201,  ...,  1.6988,  1.7511,  1.6291],
     [-1.0550, -0.9853, -1.1073,  ...,  1.6640,  1.6814,  1.6465]]])

Crop image after transform shape: torch.Size([3, 82, 43])

mchi · August 11, 2021, 2:25pm

As I mentioned previously, below is the forum to calculate the output.
If you want to divide the pixel value by 255 before sub the mean, you can multiple mean by 255 offline before filling it into the DS config file.

y = net_scale_factor*(x-mean)

Topic		Replies	Views
PyTorch normalization in Deepstream config DeepStream SDK	5	3424	October 12, 2021
DeepStream TRT preprocessing settings DeepStream SDK	2	859	October 12, 2021
Reflecting Pytorch Normalize transform parameter to Deepstream configuration DeepStream SDK pytorch , inference-server-triton	4	1554	October 12, 2021
Image preprocess question DeepStream SDK	6	2506	October 12, 2021
Preprocessing for Deepstream CarColor, CarMake and VehicleTypes DeepStream SDK	10	1010	October 12, 2021
Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds DeepStream SDK deepstream , config	24	1234	January 4, 2024
Image normalization in deepstream pipeline DeepStream SDK gstreamer , image-processing	2	500	July 24, 2023
Discrepancy between PyTorch and DeepStream inference when deploying a custom ReID model DeepStream SDK deepstream	7	153	November 7, 2025
Image Normalize DeepStream SDK	4	970	October 12, 2021
Can nvinfer normailize image base on mean and std of each image DeepStream SDK deepstream	4	128	November 25, 2024

Normalizing objects (yolo output) to be processed for the secondary classifier

Related topics