Preprocessing of image in InferPreprocessor::transform

I don’t have correct result from secondary gie’s output.
Suspicious on the preprocessing part.

I have made inference on TensorRT working well.
The preprocessing for input image in TensorRT is as follows.

Dims NumPlateRecognition::loadJPEGFile(std::vector<std::string> fileName, int num)
{
    Dims4 inputDims{num, 24, 94, 3};
    Dims4 inputDims_1img{1, 24, 94, 3};
    const size_t vol = samplesCommon::volume(inputDims);
    const size_t vol_1img = samplesCommon::volume(inputDims_1img);
    unsigned char *data = new unsigned char[vol];
    for(int f=0; f < num; f++){
       cv::Mat image, im_rgb;
       image = cv::imread(fileName[f], cv::IMREAD_COLOR);
       cv::cvtColor(image, im_rgb, cv::COLOR_BGR2RGB);
       image.release();
       memcpy(data+(f*vol_1img), im_rgb.ptr<unsigned char>(), vol_1img);
       im_rgb.release();
       mInput.hostBuffer.resize(inputDims);
       float* hostDataBuffer = static_cast<float*>(mInput.hostBuffer.data());
       std::transform(data, data+vol, hostDataBuffer, [](uint8_t x) { return (static_cast<float>(x) / 255.0); });      
    }
    delete[] data;
    return inputDims;    
} 

The steps are (1) need to resize the input image to 24(h)x94(w) size. (2) normalization [](uint8_t x) { return (static_cast<float>(x) / 255.0); }

Observed the preprocessing part in deepstream inside this function.

NvDsInferStatus InferPreprocessor::transform(NvDsInferContextBatchInput& batchInput, void* devBuf, CudaStream& mainStream, CudaEvent* waitingEvent)
{

}

My configuration file has
dstest2_sgie1_config.txt (3.7 KB)

infer-dims=24;94;3
net-scale-factor=0.0039215697906911373
model-color-format=0

I am trying to make sure Deepstream has same preprocessing as I implemented for TensorRT. My queries are as follows.

(1)Since this is processing in sgie, the detection outputs from pgie need to be resized to 24x94x3.
The conversion function used for my case is convertFcn = NvDsInferConvert_C3ToP3Float;. Looked inside nvdsinfer_conversion.cu file and I just found API, actual implementation is in cuda file.
So is resizing done for sgie input?

(2)Printed out the following loop inside nvdsinfer_context_impl.cpp (line 405-412)

if (convertFcn) {
            std::cout<<"convertFcn is " << 0 <<" "<< m_NetworkInfo.width << " " << m_NetworkInfo.height << " " << m_Scale << " " << batchInput.inputPitch << std::endl;
            /* Input needs to be pre-processed. */
            convertFcn(outPtr, (unsigned char*)batchInput.inputFrames[i],
                m_NetworkInfo.width, m_NetworkInfo.height, batchInput.inputPitch,
                m_Scale, m_MeanDataBuffer.get() ? m_MeanDataBuffer->ptr<float>() : nullptr,
                *m_PreProcessStream);
        }

Found out correct network input size and scale.
But pitch is 512 for input size 94 24 (sgie input size) and 7680 for 1920 1080 (pgie input size)
How does pitch is calculated?
Understood that pitch is width-based, so 94*3 = 282.

(3)Since sgie is trained from Tensorflow, its data format is NHWC format. Is that matter?
I checked nvinfer1::PluginFormat. It doesn’t have kNHWC format.
So the plugin layer (the last layer of sgie) is set nvinfer1::PluginFormat::kLINEAR for the data format.
Is that OK?

(4)Inside TensorRT, preprocessing is done as follows.
(uint8_t x) { return (static_cast<float>(x) / 255.0); }
Input pixel (unsigned char) is converted to float and normalized with 255.0.
That is net-scale-factor inside my config file net-scale-factor=0.0039215697906911373.
Then converted to uint_8.

Where can I check the same thing is implemented in Deepstream?

I can see only this line in nvdsinfer_conversion.cu (line 208)

NvDsInferConvert_CxToP3FloatKernel <<<blocks, threadsPerBlock, 0, stream>>>
            (outBuffer, inBuffer, width, height, pitch, 3, scaleFactor);

Hi,

Just want to clarify first.

In general, Deepstream includes detector and classifier.
Do you also feed the TensorRT pipeline with the same detector output?
Or run the Deepstream pipeline with only sgie?

Since if the workflow is not aligned, it is hard to compare the output difference.

Thanks.

TensorRT was tested with cropped images. Load with opencv in bgr, then convert to rgb and do normalization and infer.

Deepstream is pgie detection first, crop bounding boxes and fed into sgie.

My another query is whether I can implement customized code for preprocessing?

Hi,

Yes, you can. The preprocessing is open-sourced.
Please find it in this folder:

/opt/nvidia/deepstream/deepstream-5.0/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp

Thanks.

I found two things to discuss.

Deepstream’s color format is NvDsInferFormat_RGBA. Four channels.
So pgie’s network input size 1920x1080 and it has inputPitch 7680 (1920x4)

But for sgie’s network input size is 94x24 and inputPitch should be 376 (94x4).
From my print, I saw that inputPitch is 512 for sgie. When I change to 376, output results make more sense.
inputPitch for pgie is correct 7680, when I print I can see 7680.
Is that bug in Deepstream?

Then can i confirm that image resizing is done only in streammux?
Then I have only one streammux in at the beginning of Deepstream pipeline and so there is no image resizing for sgie to match network input size Is that true? So I need image resizing code for sgie. Can I confirm? Then I will make my own image resizing code for sgie.

I did printing inside this loop.

for (unsigned int i = 0; i < batchSize; i++)
    {
        float* outPtr = (float*)devBuf + i * m_NetworkInputLayer.inferDims.numElements;

        if (convertFcn) {
            std::cout<<"convertFcn is " << 0 <<" "<< m_NetworkInfo.width << " " << m_NetworkInfo.height << " " << m_Scale << " " << batchInput.inputPitch << std::endl;
            /* Input needs to be pre-processed. */
		    convertFcn(outPtr, (unsigned char*)batchInput.inputFrames[i],
		        m_NetworkInfo.width, m_NetworkInfo.height, batchInput.inputPitch,
		        m_Scale, m_MeanDataBuffer.get() ? m_MeanDataBuffer->ptr<float>() : nullptr,
		        *m_PreProcessStream);
           
        } else if (convertFcnFloat) {
            /* Input needs to be pre-processed. */
            convertFcnFloat(outPtr, (float *)batchInput.inputFrames[i],
                m_NetworkInfo.width, m_NetworkInfo.height, batchInput.inputPitch,
                m_Scale, m_MeanDataBuffer.get() ? m_MeanDataBuffer->ptr<float>() : nullptr,
                *m_PreProcessStream);
        }
    }

Hi can I have reply on this?

Hi any reply for this?