Image preprocess question

383109759 · May 11, 2020, 12:32pm

deepstream provide image preprocess method: y = net-scale-factor * (x - offsets[c]).
But I need the “std” such as std=[0.229, 0.224, 0.225]，Then the formula: y = (x - mean[c] ) / std[c] .
I try to use the middle value 0.225, then: y = (1 / 255 / 0.225 ) * (x - mean[c] )，But the results are quite different.
What should I do?

DaneLLL · May 12, 2020, 1:27am

Hi,

Please share where you see the formulas. Looks specific to certain models?

383109759 · May 12, 2020, 1:50am

AastaLLL · May 12, 2020, 7:45am

Hi,

1.
Assuming uniform distribution for an image(the probability of each value is equal), the std is roughly 74.05 with mean = 127.5.

2.
If convert the RGB value to [0,1], the std is near 0.578 with mean = 0.0.

v’ = (v-127)/128

3.
So for 0.229:

| v’ - m’ | ^ 2 ~ (0.578)^2
| v’’ - m’’ | ^2 ~ (0.229)^2

v’’ ≈ v’ / (0.578) * (0.229)
v’’ ≈ ( v -127 ) / 128 / 0.578 * 0.229

Mean = 127
net-scale-factor = 1/128/0.578*0.229 = 0.003095

Thanks.

383109759 · May 12, 2020, 10:08am

Hello, you didn’t fully understand what I mean. My std has three different values : [0.229, 0.224, 0.225].
You only transform calculated one. The deepstream accept input only one “net-scale-factor” value, you can’t calculate the other two values(0.225, 0.224) meanwhile and input to deepstream.
Can deepstream solve this problem at present ？

AastaLLL · May 13, 2020, 4:22am

Hi,

The std difference between channel is small so maybe set an average value will be fine.
The other two value you can calculate as a similar way.

To support different scale value on the channel, you can modify the source here directly.

/opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_conversion.cu

Ex. NvDsInferConvert_C3ToP3Float

__global__ void
NvDsInferConvert_CxToP3FloatKernel(
  ...)
{
    unsigned int row = blockIdx.y * blockDim.y + threadIdx.y;
    unsigned int col = blockIdx.x * blockDim.x + threadIdx.x;

    if (col < width && row < height)
    {
        for (unsigned int k = 0; k < 3; k++)
        {
            outBuffer[width * height * k + row * width + col] =
                scaleFactor * inBuffer[row * pitch + col * inputPixelSize + k];
        }
    }
}

Thanks.