Crowd Estimation Model giving density map as output seems to have persistent buffers

Please provide complete information as applicable to your setup.

**• Hardware Platform ** : GPU
• DeepStream Version: 6.2.0
• TensorRT Version: 8.2.0
• NVIDIA GPU Driver Version (valid for GPU only): 525.85.12
• Issue Type( bugs): I am running a crowd estimation model (SAS-Net) as pgie. It gives a 360x640 output buffer which is the crowd density over the whole frame. Now, when I try to generate heatmap from this crowd density output, there seems to be persistent values in the buffer, it seems like buffer holds values from the previous frame and is not reset for every frame. So, when i generate the heatmaps , it contains some abnormal pixels which seems similar to the trajectory a person has moved across the frames.
The model was originally trained in pytorch which i then converted to onnx and then to tensorRT engine using deepstream. I have tested pytorch and onnx models on same video and no issue comes. Its only when I run Deepstream, i face this issue. Attaching images for reference:




We don’t know what you have done from your description. Do you customized the postprocessing for your model? If so, how did you implement it?

For postprocessing, firstly i used this configuration for pgie :

## 0=Detector, 1=Classifier, 2=Segmentation, 100=Other
network-type=100
# Enable tensor metadata output
output-tensor-meta=1

Now, I am following following logic to process the tensor output :

/* Iterate user metadata in frames to search PGIE's tensor metadata */
    for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list;
         l_user != NULL; l_user = l_user->next)
    {
      NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
      if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META)
        continue;

      // no of elments in output layer: equal to that in input layer for crowd model
      int outLayerElements = 0;
      /* convert to tensor metadata */
      NvDsInferTensorMeta *meta =
          (NvDsInferTensorMeta *)user_meta->user_meta_data;
      for (unsigned int i = 0; i < meta->num_output_layers; i++)
      {
        NvDsInferLayerInfo *info = &meta->output_layers_info[i];
        info->buffer = meta->out_buf_ptrs_host[i];
        if (use_device_mem && meta->out_buf_ptrs_dev[i])
        {
          CUDA_CHECK(cudaMemset(meta->out_buf_ptrs_host[i], 0, info->inferDims.numElements * 4));
          CUDA_CHECK(cudaMemcpy(meta->out_buf_ptrs_host[i], meta->out_buf_ptrs_dev[i],
                     info->inferDims.numElements * 4, cudaMemcpyDeviceToHost));
          CUDA_CHECK(cudaMemset(meta->out_buf_ptrs_dev[i], 0, info->inferDims.numElements * 4));         
        }
        outLayerElements = info->inferDims.numElements;

        //cout<<"-------------------------------------------------------------------------------------"<<endl;
        //cout<<"Number of dimensions in the Output Layer - "<<i<<" "<<info->inferDims.numDims<<endl;
        //cout<<"Number of channnels = "<<info->inferDims.d[0]<<endl;
        //cout<<"Height ="<<info->inferDims.d[1]<<endl;
        //cout<<"Width ="<<info->inferDims.d[2]<<endl;
      }
      /* Parse output tensor and fill detection results into objectList. */
      std::vector<NvDsInferLayerInfo>
          outputLayersInfo(meta->output_layers_info,
                           meta->output_layers_info + meta->num_output_layers);


      //store the crowd model output layer buffer in a vector
      std::vector<float> crowdBuffer((float*)outputLayersInfo[0].buffer,
                                                (float*)outputLayersInfo[0].buffer + outLayerElements);
    }

I get the crowd output buffer as a 1-D vector in outputLayersInfo[0].buffer which i then store into a vector. Afterwards I translate it from 1-D to 2-D to get the heatmap.

How did you find out there are persistent values in the buffer?

I dumped the heatmaps and noticed that the glitches in the heatmaps followed the trajectory of the moving person’s head. Hence the buffer which I copied has persistent values.

std::vector<float> crowdBuffer((float*)outputLayersInfo[0].buffer,
                                                (float*)outputLayersInfo[0].buffer + outLayerElements);

Why did you use cudaMemset() to clean “meta->out_buf_ptrs_dev[i]” here? Please remove it.

Actually I used it to clear the device(GPU) buffers after doing cudaMemcpy() so that the buffers are reset to 0. I tried removing the cudeMemset() but the issue is still there.

Please do not do this. "meta->out_buf_ptrs_dev[i] is the address of the device memory but not the memory. The gst-nvinfer will handle the device memory internally. Please do not do anything to it.

Okay…got your point. So the issue of persistent buffers in this case is a bug in gst-nvinfer?

No. “meta->out_buf_ptrs_dev[i]” is managed inside gst-nvinfer, you don’t need to do anything to it. It is your codes’ bug.

No I meant after removing that cudaMemSet portion from my code, the issue of persistent buffers is still there. And there is nothing extra i am doing, just cudaMemCpy the buffers from the meta.

Can you provide the model and your codes to us to reproduce the issue?

This is the code I used in the callback gie_primary_processing_done_buf_prob() in deepstream_app.c to dump heatmaps :

  for (NvDsMetaList * l_frame = batch_meta->frame_meta_list; l_frame != NULL;
      l_frame = l_frame->next)
  {
    string filePrefix = "HeatMapDebug/";
    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *) l_frame->data;
    // process crowd model metatdata
      /* Iterate user metadata in frames to search PGIE's tensor metadata */
      for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list;
          l_user != NULL; l_user = l_user->next)
      {
        NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
        if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META)
          continue;

        // no of elments in output layer: equal to that in input layer for crowd model
        int outLayerElements = 0;
        /* convert to tensor metadata */
        NvDsInferTensorMeta *meta =
            (NvDsInferTensorMeta *)user_meta->user_meta_data;
        for (unsigned int i = 0; i < meta->num_output_layers; i++)
        {
          //cout<<"meta->num_output_layers ="<<meta->num_output_layers<<endl;
          NvDsInferLayerInfo *info = &meta->output_layers_info[i];
          info->buffer = meta->out_buf_ptrs_host[i];
          if (use_device_mem && meta->out_buf_ptrs_dev[i])
          {
            cudaMemcpy(meta->out_buf_ptrs_host[i], meta->out_buf_ptrs_dev[i],
                      info->inferDims.numElements * 4, cudaMemcpyDeviceToHost);       
          }
          outLayerElements = info->inferDims.numElements;

          //cout<<"-------------------------------------------------------------------------------------"<<endl;
          //cout<<"Number of dimensions in the Output Layer - "<<i<<" "<<info->inferDims.numDims<<endl;
          //cout<<"Number of channnels = "<<info->inferDims.d[0]<<endl;
          //cout<<"Height ="<<info->inferDims.d[1]<<endl;
          //cout<<"Width ="<<info->inferDims.d[2]<<endl;
        }
        /* Parse output tensor and fill detection results into objectList. */
        std::vector<NvDsInferLayerInfo>
            outputLayersInfo(meta->output_layers_info,
                            meta->output_layers_info + meta->num_output_layers);

        //store the crowd model output layer buffer in a vector
        std::vector<float> crowdBuffer((float*)outputLayersInfo[0].buffer,
                                                  (float*)outputLayersInfo[0].buffer + outLayerElements);
        
        #if 1
        //#########################################################
        //Debug code to generate and dump heatmaps from here

        Mat denseCrowdFrameHeatMap;
        //640*360
        double densityArr[640][360];
        for(int i=0; i< 640; i++)
          for(int j=0; j<360; j++)
            if(crowdBuffer[i*360 + j] > 0)
              densityArr[i][j] = 255.0 * crowdBuffer[i*360 + j];

        cv::Mat densityMat(360, 640, CV_64F, densityArr);
        //scale to 8 bit
        densityMat.convertTo(densityMat, CV_8UC3);
        //Apply colorMap
        applyColorMap(densityMat, denseCrowdFrameHeatMap, COLORMAP_JET);
        resize(denseCrowdFrameHeatMap, denseCrowdFrameHeatMap, Size(1920,1080), INTER_LINEAR);

        //dump the heat map
        string heatMapImgName = filePrefix + "heatMap_" + to_string(frame_meta->frame_num) + ".jpg";
        imwrite(heatMapImgName, denseCrowdFrameHeatMap);
        crowdBuffer.clear();
        denseCrowdFrameHeatMap.release();
    }
}        

Model file(onnx) can be downloaded from here:
https://drive.google.com/file/d/13CtFYfvwqQDnVxUykYgGbXj7GxPaTS2F/view?usp=sharing

@Fiona.Chen Are you able to reproduce this issue?

@Fiona.Chen any update?

@Fiona.Chen ??

Please provide the complete project(all source code and configurations).

@Fiona.Chen I can’t provide entire project code as it violates privacy agreement of my organization. I can provide you the config i used for the gie for this model.

[property]
gpu-id=0

onnx-file=/var/okean/bin/model.onnx
model-engine-file=/var/okean/bin/model.onnx_b1_gpu0_fp32.engine


force-implicit-batch-dim=0
batch-size=1
network-mode=0
interval=0
input-object-min-width=64
input-object-min-height=64
process-mode=1
model-color-format=0
gpu-id=0
gie-unique-id=1
operate-on-gie-id=1
operate-on-class-ids=0
is-classifier=0
output-blob-names=output_1

##0=Detector, 1=Classifier, 2=Segmentation, 100=Other
network-type=100
#Enable tensor metadata output
output-tensor-meta=1

This is the config used for running the deepstream-app ::

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
kitti-track-output-dir=/var/logs/

[tiled-display]
enable=0
width=1920
height=1080
rows=1
columns=1
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L#2 2=URI 3=MultiURI 4=RTSP
type=4
uri=rtsp://admin:admin123@10.33.1.55:554
select-rtp-protocol=4
latency=350
rtsp-reconnect-interval-sec=10
rtsp-reconnect-attempts=-1
num-sources=1
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0
# 0 = disable, 1 = through cloud events, 2 = through cloud + local events
smart-record=2
# smart record specific fields, valid only for source type=4 RTSP
# 0 = mp4, 1 = mkv
#smart-rec-container=0
# video cache size in seconds
#smart-rec-video-cache
# default duration of recording in seconds.
smart-rec-default-duration=999
# duration of recording in seconds.
# this will override default value.
smart-rec-duration=86400
# seconds before the current time to start recording.
smart-rec-start-time=0
# value in seconds to dump video stream.
#smart-rec-interval

[osd]
enable=0
gpu-id=0
process-mode=1
border-width=4
text-size=20
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;0.5
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
batch-size=1
## Set muxer output width and height
width=1920
height=1080
##Boolean property to inform muxer that sources are live
live-source=1
gpu-id=0
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
#attach-sys-ts-as-ntp=0

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=pgie_crowd_config.txt

[tracker]
enable=1
# For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so
# ll-config-file required to set different tracker types
# ll-config-file=config_tracker_IOU.yml
# ll-config-file=config_tracker_NvDCF_perf.yml
ll-config-file=config_tracker_NvDCF_accuracy.yml
# ll-config-file=config_tracker_DeepSORT.yml
gpu-id=0
enable-batch-process=1
#enable-past-frame=1
display-tracking-id=1

[tests]
file-loop=0