Extract preprocessed tensors from nvpreprocess

peeranat85 · July 9, 2022, 3:15pm

Hi,

I’ve been trying to debug pgie because it gave different result than the original Pytorch model. I verified the result between Pytorch model and TensorRT engine (FP32 mode) and they are almost similar as expected. However, when I checked the entire DeepStream pipeline, the result is different.
Here’s my pipeline

nvstreammux -> nvdspreprocess -> nvinfer -> ...

I verified the output of nvstreammux. Both padding and size are correct except the color, I think it’s not RGB but i guess the downstream pgie module will take care of it.

For the output extraction from nvstreammux, I followed this example.

Because my pipeline uses nvdspreprocess for preprocessing, I rebuilt both libnvdsgst_preprocess.so and libcustom2d_preprocess.so with DEBUG_LIB and DEBUG_TENSOR flags enabled. This would save impl_out_batch_xxx.bin file. When I checked the size, it wasn’t right.

tensor = np.fromfile('xxx/tensorout_batch_0.bin')
tensor.shape
<<< (992256,)

my network takes width = 1088, height = 608, so here the tensor shape should be 1088 x 608 x 3=1984512. How can we investigate the output from nvdspreprocess when these 2 flags are enabled?

After my failed attempt, I then tried to verify the input of pgie instead since it should be the same as the output of nvdspreprocess. I followed this suggestion. I applied the patch file but for the image saving, I applied the change in queueInputBatchPreprocessed instead of queueInputBatch since the preprocessing in pgie is disabled in favor of nvdspreprocess. The snippet in this function looks as follows

#ifdef DUMP_INPUT_TO_FILE
#define DUMP_FRAME_CNT_START (0)
#define DUMP_FRAME_CNT_STOP (10)
    if ((m_FrameCnt++ >= DUMP_FRAME_CNT_START) &&
        (m_FrameCnt <= DUMP_FRAME_CNT_STOP))
    {
        void *hBuffer;

        printf("batchDims.batchSize = %d\n", batchSize);
        assert(m_AllLayerInfo.size());
        for (size_t i = 0; i < m_AllLayerInfo.size(); i++)
        {
            NvDsInferLayerInfo &info = m_AllLayerInfo[i];
            assert(info.inferDims.numElements > 0);

            if (info.isInput)
            {
                int sizePerBatch =
                    getElementSize(info.dataType) * info.inferDims.numElements;

                cudaDeviceSynchronize();

                for (int b = 0; b < batchSize; b++)
                {
                    float *indBuf =
                        (float *)bindings[i] + b * info.inferDims.numElements;
                    int w = info.inferDims.d[2];
                    int h = info.inferDims.d[1];

                    printf("width = %d, height = %d\n", w, h);
                    printf("sizePerBatch: %d, inferDims.numElements = %u\n", sizePerBatch, info.inferDims.numElements);


                    // if (scale < 1.0)
                    // {
                    //     // R or B
                    //     NvDsInferConvert_FtFTensor(
                    //         (float *)m_inputDumpDeviceBuf,
                    //         indBuf, w, h, w, 1 / scale, NULL, NULL);
                    //     // G
                    //     NvDsInferConvert_FtFTensor(
                    //         ((float *)m_inputDumpDeviceBuf + w * h),
                    //         ((float *)indBuf + w * h),
                    //         w, h, w, 1 / scale, NULL, NULL);
                    //     // B or R
                    //     NvDsInferConvert_FtFTensor(
                    //         ((float *)m_inputDumpDeviceBuf + 2 * w * h),
                    //         ((float *)indBuf + 2 * w * h),
                    //         w, h, w, 1 / scale, NULL, NULL);
                    //     indBuf = (float *)m_inputDumpDeviceBuf;
                    // }

                    RETURN_CUDA_ERR(
                        cudaMemcpy(m_inputDumpHostBuf, (void *)indBuf, sizePerBatch,
                                   cudaMemcpyDeviceToHost),
                        "postprocessing cudaMemcpyAsync for output buffers failed");

                    bool dumpToRaw = false;

                    std::string filename =
                        "gie-" + std::to_string(m_UniqueID) +
                        "_input-" + std::to_string(i) +
                        "_batch-" + std::to_string(b) +
                        "_frame-" + std::to_string(m_FrameCnt);

                    if (dumpToRaw)
                        filename += ".raw";
                    else
                        filename += ".png";

                    NvDsInferFormat format = NvDsInferFormat_RGB;
                    dump_to_file(filename.c_str(),
                                 (unsigned char *)m_inputDumpHostBuf, sizePerBatch,
                                 w, h, dumpToRaw, format);
                }
            }
        }
    }
#endif

files are saved and the shape is correct but everything is black. It doesn’t seem right to me.

so I investigate the shape

img = cv2.imread('/home/coco/workspace/sertis/object-tracking-app/gie-1_input-0_batch-0_frame-1.png')
img.shape
<<< (608, 1088, 3)

in this case, what could go wrong? is this the right way we should debug the preprocessed tensors?

Please find my configs here

[property]
enable=1
target-unique-ids=1

# network-input-shape: batch, channel, height, width
network-input-shape=1;3;608;1088

# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
tensor-name=images

processing-width=1088
processing-height=608

# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE
# 3=NVBUF_MEM_CUDA_UNIFIED  4=NVBUF_MEM_SURFACE_ARRAY(Jetson)
scaling-pool-memory-type=0

# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU
# 2=NvBufSurfTransformCompute_VIC(Jetson)
scaling-pool-compute-hw=0

# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0

# model input tensor pool size
tensor-buf-pool-size=8

# custom-lib-path=/opt/nvidia/deepstream/deepstream-6.0/lib/gst-plugins/libcustom2d_preprocess.so
custom-lib-path=/opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspreprocess/nvdspreprocess_lib/libcustom2d_preprocess.so
custom-tensor-preparation-function=CustomTensorPreparation

[user-configs]
pixel-normalization-factor=0.003921568

# Currently, nvsdpreprocess is in alpha stage, thus, std scaling is not yet supported.
# preprocessing logic is as follows
#
# out = pixel-normalization-factor * (x - mean[c])
#
# more detail, see https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdspreprocess.html

# ByteTrack's mean: 0.485, 0.456, 0.406
# ByteTrack's std: 0.229, 0.224, 0.225
# rescale back to [0-255]:
# 0.485 * 255 = 123.675
# 0.456 * 255 = 116.28
# 0.406 * 255 = 103.53
offsets=123.675;116.28;103.53

[group-0]
# set src-ids=-1 to use batch_size in network-input-shape
src-ids=-1
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=0
# roi-params-src-0=0;0;1088;608

model: here

Note: I already disabled pgie’s default preprocessing by setting input-tensor-meta=1 and also set model-color-format=0 for RGB input.

Environment
Architecture: x86_64
GPU: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
NVIDIA GPU Driver: Driver Version: 495.29.05
DeepStream Version: 6.0 (running on docker image nvcr.io/nvidia/deepstream:6.0-devel)
TensorRT Version: v8001
Issue Type: Question

mchi · July 10, 2022, 3:27am

Hi @ peeranat85
Comprared to the original patch, you removed at least two lines below.
scale is important to get a correct image, because, seems you set scale to 0.003921568 according to “pixel-normalization-factor=0.003921568”, if you don’t de-normalize the data with this scale, the pixel data in the output file are very small (from -1 to 1), which will lead to dark image as you saw now.

+        float scale = m_Preprocessor->getScale();
+        NvDsInferFormat format = m_Preprocessor->getNetworkFormat();

peeranat85 · July 10, 2022, 6:39am

Hi @mchi
Thanks for your response. I removed it because it gave a segfault. This is because m_Preprocessor is nullptr as the preprocessing is executed in nvpreprocess module. Btw, I didn’t intend to denormalize the image in the first place. I tried to extract the raw preprocessed tensors and compare with the ones from Python to make sure they are the same.

mchi · July 10, 2022, 9:36am

so, things are explained, right? any other question?

peeranat85 · July 11, 2022, 8:18am

No, not yet. After further investigation, saving preprocessed tensors into a file using cv2.imwrite results in precision loss as seen in the dark image I sent earlier. To fix this, I saved the raw binary file instead like so

std::ofstream dump_file(filename, std::ios::binary);
dump_file.write((const char *)buffer, size);

This resulted in a huge file
gie-1_input-0_batch-0_frame-1.raw (7.6 MB)

I checked the file

with open("xxx/gie-1_input-0_batch-0_frame-1.raw", 'rb') as f:
    data = np.fromfile(f, dtype=np.float32)

output = data.reshape((608, 1088, 3))
plt.imshow(output)

This is not what I expected. How come there are multiple instances of a single frame and the color appears to be in gray? See original frame below.

This is what I got from Python code and it is what I expected (or something similar).

def letterbox(img, height=608, width=1088, color=(0, 0, 0)):  # resize a rectangular image to a padded rectangular 
    shape = img.shape[:2]  # shape = [height, width]
    ratio = min(float(height)/shape[0], float(width)/shape[1])
    new_shape = (round(shape[1] * ratio), round(shape[0] * ratio)) # new_shape = [width, height]
    dw = (width - new_shape[0]) / 2  # width padding
    dh = (height - new_shape[1]) / 2  # height padding
    top, bottom = round(dh - 0.1), round(dh + 0.1)
    left, right = round(dw - 0.1), round(dw + 0.1)
    img = cv2.resize(img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # padded rectangular
    return img, ratio, dw, dh

img = cv2.imread(img_path)
img, r, dw, dh = letterbox(img, color=(0, 0, 0))
img = img[:, :, ::-1].astype(np.float32)
img /= 255.0
img -= (0.485, 0.456, 0.406) # rgb mean
plt.imshow(img)

Note that the preprocessing logic here is the same as configured in DeepStream’s nvpreprocess module.

To summarize the question,

with this preprocessing logic, why I didn’t get the same output (or something similar) as in Python?
In gie-1_input-0_batch-0_frame-1.raw, how come there are multiple instances of a single frame and the color appears to be in gray despite setting model-color-format=0?

mchi · July 12, 2022, 1:49am

what’s the batch size of pgie?

peeranat85 · July 12, 2022, 4:47am

the batch size is 1

peeranat85 · July 12, 2022, 7:15am

FYI, Just to give you an update. When I switched to using pgie’s preprocessing (batch-size=1) and removing nvpreprocess plugin. The output png file looks correct.

However, the raw binary stills gives a pack of images as before.

nvinver-gie-1_input-0_batch-0_frame-1.raw (7.6 MB)

Not sure this is the right way to check the raw binary file?

peeranat85 · July 12, 2022, 2:44pm

Finally, I managed to find out why I got a pack of images when visualizing the raw binary file.

The data.reshape((608, 1088, 3)) should be as follows

The channel must come first during the reshape. Now the preprocessed image looks similar to that from Python code.

system · August 1, 2022, 1:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Preprocessing of image in InferPreprocessor::transform DeepStream SDK nvbugs	24	1993	October 12, 2021
How to get `nvinfer` to be as accurate as TensorRT's API? DeepStream SDK tensorrt , tensorflow , gstreamer , nvbugs , python , deepstream	25	259	November 19, 2024
Collecting images with pyds.get_nvds_buf_surface DeepStream SDK nvbugs	23	3916	October 12, 2021
Difference between predictions of exported TensorRT engine and PyTorch pth models DeepStream SDK	22	2332	March 14, 2023
Preprocessing of frames - gst-dsexample DeepStream SDK	6	1532	October 12, 2021
Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds DeepStream SDK deepstream , config	24	974	January 4, 2024
Visualization bug when using preprocessing and metamux DeepStream SDK deepstream	3	63	August 30, 2024
Nvdspreprocess seems do process twice in ds6.2 DeepStream SDK	19	541	October 11, 2023
How to append DeepStream Metadata in Python without using Streammux / nvinfer for parallel branch? DeepStream SDK	21	700	March 12, 2024
Extracting Processed Frames After Inference in Nvidia DeepStream Efficiently DeepStream SDK gstreamer , python , deepstream	10	72	February 12, 2025

Extract preprocessed tensors from nvpreprocess

Related topics