Problem in "" compared with "": No memory copy!!

Hi all, when running the sample “deepstream-app” in the deepstream_sdk_3.0, I encountered a problem that the GIE inference stage (which uses the “nvinfer” plugin from the “”) actually doesn’t produce any results! It seems that the ResNet model in the GIE inference isn’t invoked by “nvinfer” at all. However, when I replace the “nvinfer” plugin with “nvyolo” plugin from the “” in the Yolo example, the classification can work!
So I use the nvdia profiler to see the difference between these two executions (nvinfer Vs. nvyolo).
I find that: using the “nvyolo” as the inference GIE, the process of each inference iteration will be “MemCpy(Host to Device) -> launch and perform inferences -> MemCpy(Device to Host)”. However, using “nvinfer” as in the original deepstream sample, the process is only “launch and perform inferences -> MemCpy(Device to Host)”. There is no “MemCpy(Host to Device)”!! So, it seems the inference using “nvinfer” doesn’t get the effective data at all!!
Then I check the API calls. Using the “nvyolo”, during each iteration of inference, “CudaMemcpyAsync()-> CudaEventRecord()->Launce Inference Kernels-> …” will be called. However, using “nvinfer”, in each iteration of inference, “CudaSetDevice()->CudaStreamSynchronization()->CudaEventRecord()->Launch Inference Kernels…” will be called. There is no “CudaMemCpyAsync()” at the beginning of each inference in “nvinfer”!!!
So, I guess the reason that the inference cannot work when using “nvinfer” as in the original deepstream samples is that “CudaMemCpyAsync()” is not called at the beginning of each inference!! But, it seems all these calls are done in the “” or “”, which are all provided inside the deepstream_SDK_3.0. I wonder if this is a bug from the deepsrtream_sdk_3.0 release? Or is this because the settings mismatch in my system??


There are some difference between the libgstnvinfer and libgstnvyolo.

1. The libgstnvinfer demonstrate a generic model which can fully-supported by the TensorRT. Flow is camera -> TensorRT -> display.
Deepstream can map the camera data into GPU-accessible memory. So you don’t need to do memcpy for it.

2. The libgstnvyolo sample demonstrate how to enable a TensorRT non-supported layer in the DeepStream.
In order to add customized implementation, you will need to copy the data back to CPU.

3. In DeepStreamSDK, we don’t do inference each frame for performance.
This is a configurable parameter in the config file. You can update it for your use case.

For example:


You can find more information in our document:

Reference Application Configuration
Configuration Groups
Primary GIE and Secondary GIE Group


Thank you so much for your help and answer!!
However, I have tried many times but “nvinfer” always cannot work. From the profiling results, it seems that “nvinfer” doesn’t map the GPU accessible memory. As far as I understand, if “nvinfer” does some GPU accessible memory mapping, there should be some special CUDA API calls for such memory mapping. But from the profiling, I cannot see any special CUDA API calls related with such mapping. Also since my “nvinfer” never generates the classification results, I guess there may be some problems in the data sharing?? So, if Deepstream itself can map the camera data into GPU-accessible memory, how does Deepstream do it? Which CUDA API call does it use for such mapping? Why I cannot see such mapping related API calls in the profiling? Thank you so much for your help!!


Not sure which config file you used but it’s recommended to use this one to see the nvinfer component:

$ cd {deepstream_sdk}/sources/objectDetector_SSD
$ deepstream-app -c deepstream_app_config_ssd.txt


Dear AastaLLL,
Thank you so much for the help!!