**• Hardware Platform: RTXA4000
**• DeepStream Version: 6.0
**• TensorRT Version: 8.0.1
**• NVIDIA GPU Driver Version: 470.94
**• Issue Type: NvDsOsd unnecessarily doing DeviceToHost and HostToDevice transfers
Reproduce the issue:
1- Build container
./docker_build.sh
2 - Run container
./docker_run.sh
3- Build application
mkdir appsink_nvmm/build
cd appsink_nvmm/build
cmake ..
make
4 - Run application
./appsink_nvmm
profile the application using:
nsys profile --trace cuda,osrt,nvtx --force-overwrite true ./appsink_nvmm
5 - change process-mode from 1 to 0 in appsink_nvmm.cpp
repeat steps 1 through 4
nvdsosd-transparency-demo-main.zip (30.0 KB)
Observe, for every frame processed, when process-mode is set to 1 (gpu-mode) there is a memcpy from device to host and from host to device.
Observe, for every frame processed, when process-mode is set to 0 (cpu-mode) there is a memcpy from device to host twice, and from host to device twice.
This may not be problematic for the example processing720p @ 30 fps, but for an application processing 4K @ 60fps the transfers are 18X larger and this becomes a bottleneck.
Why is nvdsosd transfering memory to and from the host?