Normalizing objects (yolo output) to be processed for the secondary classifier

I am trying to make a pre-processing on the objects that came out from yolo,
At my original model i used this function for normalizing

transforms = T.Compose([
    T.Resize(size=(288, 144)),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

So the means are : [0.485, 0.456, 0.406]
the stds are: [0.229, 0.224, 0.225]
and this compose function work as this equation : (x.sub(mean).div(std))

When I try to make this at deepstream I found that the offset means “the mean”:
offsets=0.485;0.456;0.406

Then the deepstream work with this equation (X-offset)* net-scale-factor
so the net-scale-factor now is equal to (1/std)
so I made it like that : 1/0.255 = 4.44, net-scale-factor=4.44

The model’s output is totally wrong and different from the original model, so am wondering if there any step i do wrong ??

• Hardware Platform (GPU)
• DeepStream Version: 5.0
• TensorRT Version: 7.0.0.11

@Fiona.Chen @mchi Could you check this please, it’s urgent ^^

================================
In nvinfer

y = net_scale_factor*(x-mean)

Where:

  • x is the input pixel value. It is an int8 with range [0,255].
  • mean is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float.
  • net-scale-factor is the pixel scaling factor specified in the configuration file. It is a float.
  • y is the corresponding output pixel value. It is a float.

==============================

Normally, net_scale_factor equals = 1/255 , so the y is float from (-1, 1).

1 Like

in such case, what the data range of this x, is it (0,255] or data from (0,255] drided 255 ?

I’ll give you a sample of the image before and after the transformation with the original code:
The transform function:
transforms = T.Compose([
** T.Resize(size=(288, 144)),**
** T.ToTensor(),**
** T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])**
])

image before transform (before resize or normalize) :
[[[111 99 91]
[112 100 92]
[111 99 91]

[104 92 78]
[105 90 81]
[105 90 83]]

[[111 98 92]
[111 99 92]
[110 98 91]

[104 92 78]
[104 91 82]
[104 90 83]]

[[109 97 93]
[110 97 93]
[109 97 92]

[104 90 82]
[103 89 84]
[102 89 85]]

[[174 153 139]
[ 41 21 3]
[168 149 130]

[ 70 48 36]
[ 72 50 38]
[ 74 52 40]]

[[135 117 101]
[ 50 30 15]
[165 146 127]

[ 69 47 35]
[ 71 49 37]
[ 73 52 39]]

[[108 89 73]
[ 61 41 23]
[163 144 125]

[ 70 48 36]
[ 72 50 38]
[ 73 51 39]]]

image After transform (with resize and normalize):
tensor([[[-0.5596, -0.5596, -0.5424, …, -0.7137, -0.6965, -0.6965],
[-0.5424, -0.5424, -0.5424, …, -0.7137, -0.6965, -0.6965],
[-0.5253, -0.5253, -0.5253, …, -0.6965, -0.6794, -0.6794],
…,
[ 0.0056, -0.2856, -1.1760, …, -1.4500, -1.4329, -1.4329],
[-0.4911, -0.6965, -1.2959, …, -1.4672, -1.4500, -1.4500],
[-0.8678, -1.0048, -1.3815, …, -1.4672, -1.4500, -1.4500]],

    [[-0.3025, -0.3025, -0.2850,  ..., -0.4601, -0.4601, -0.4601],
     [-0.3200, -0.3200, -0.3025,  ..., -0.4426, -0.4601, -0.4601],
     [-0.3375, -0.3375, -0.3200,  ..., -0.4601, -0.4776, -0.4776],
     ...,
     [ 0.3978,  0.0826, -0.7927,  ..., -1.1429, -1.1253, -1.1253],
     [-0.0924, -0.3200, -0.9153,  ..., -1.1604, -1.1253, -1.1253],
     [-0.4776, -0.6001, -0.9678,  ..., -1.1604, -1.1429, -1.1429]],

    [[ 0.1302,  0.1302,  0.1476,  ...,  0.0256,  0.0256,  0.0256],
     [ 0.1302,  0.1302,  0.1302,  ...,  0.0082,  0.0082,  0.0082],
     [ 0.1128,  0.1128,  0.1128,  ..., -0.0092, -0.0092, -0.0092],
     ...,
     [ 0.9668,  0.6531, -0.2184,  ..., -0.5321, -0.5147, -0.5147],
     [ 0.4439,  0.2348, -0.3404,  ..., -0.5495, -0.5321, -0.5321],
     [ 0.0779, -0.0441, -0.4101,  ..., -0.5495, -0.5321, -0.5321]]])

@mchi Any updates ?
OR Is there any difference between deepstream frames and opencv frames ?

Hi @aya95
Sorry for delay!

Looking the first number: 111 and: -0.5596, how can they calculate by the equation - (x.sub(mean).div(std)) ?

I think the reason is the original code transform the image to tesnor, and here’s the documentation for the tensor:
Converts a PIL Image or numpy.ndarray (H x W x C) in the range
[0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

So how can I mimic this process inside deep stream??
"Make it channel first then convert the pixel values inside this range [0, 1],
I think all I need is to divide the pixel value by 255 before sub the mean (offset) from it, and make it channel first
Do you have any clue ?

Also this example was not totally right because it contains resize, so we are not comparing the same pixel values,
here’s another good example:

crop_img_orig [[[ 89 73 74]
[ 92 80 80]
[ 77 69 69]

[ 15 11 8]
[ 31 24 21]
[ 48 41 38]]

[[113 97 98]
[ 85 73 72]
[ 88 80 80]

[ 47 43 39]
[ 37 30 27]
[ 39 32 28]]

[[124 108 106]
[119 103 101]
[ 79 68 65]

[102 98 94]
[ 30 23 20]
[ 14 6 4]]

[[ 43 34 36]
[ 36 27 29]
[ 50 42 44]

[202 197 206]
[189 184 194]
[201 196 206]]

[[ 47 41 42]
[ 50 43 45]
[ 45 38 40]

[201 198 209]
[204 201 211]
[197 194 204]]

[[ 43 38 39]
[ 47 42 43]
[ 40 35 36]

[199 198 209]
[200 199 209]
[198 197 207]]]
crop_img_orig shape: (82, 43, 3)
Crop image after transform: tensor([[[-0.8507, -0.7479, -0.9363, …, -1.9809, -1.7583, -1.4672],
[-0.4397, -0.8849, -0.7479, …, -1.4500, -1.6555, -1.6384],
[-0.3027, -0.3883, -1.0048, …, -0.5082, -1.7754, -2.0494],
…,
[-1.5014, -1.6213, -1.3644, …, 1.4098, 1.2043, 1.4098],
[-1.3987, -1.3473, -1.4329, …, 1.4612, 1.4954, 1.3755],
[-1.4500, -1.3815, -1.5014, …, 1.4612, 1.4612, 1.4269]],

    [[-0.7577, -0.6352, -0.8277,  ..., -1.8431, -1.6155, -1.3179],
     [-0.3375, -0.7577, -0.6352,  ..., -1.2829, -1.5105, -1.4755],
     [-0.1450, -0.2325, -0.8452,  ..., -0.3200, -1.6331, -1.9307],
     ...,
     [-1.4405, -1.5630, -1.3004,  ...,  1.4132,  1.1856,  1.3957],
     [-1.3179, -1.2829, -1.3704,  ...,  1.4307,  1.4832,  1.3606],
     [-1.3704, -1.3004, -1.4230,  ...,  1.4307,  1.4482,  1.4132]],

    [[-0.2532, -0.2010, -0.4624,  ..., -1.5430, -1.2641, -0.9678],
     [ 0.1651, -0.3230, -0.2707,  ..., -0.9853, -1.1596, -1.1247],
     [ 0.3568,  0.2696, -0.4275,  ..., -0.0267, -1.2816, -1.5604],
     ...,
     [-1.0550, -1.1770, -0.9330,  ...,  1.7163,  1.4897,  1.6988],
     [-0.9853, -0.9330, -1.0201,  ...,  1.6988,  1.7511,  1.6291],
     [-1.0550, -0.9853, -1.1073,  ...,  1.6640,  1.6814,  1.6465]]])

Crop image after transform shape: torch.Size([3, 82, 43])

As I mentioned previously, below is the forum to calculate the output.
If you want to divide the pixel value by 255 before sub the mean, you can multiple mean by 255 offline before filling it into the DS config file.

y = net_scale_factor*(x-mean)