**• Hardware Platform: RTXA4000
**• DeepStream Version: 6.0
**• TensorRT Version: 8.0.1
**• NVIDIA GPU Driver Version: 470.94
**• Issue Type: NvDsOsd unnecessarily doing DeviceToHost and HostToDevice transfers
mkdir appsink_nvmm/build cd appsink_nvmm/build cmake .. make
nsys profile --trace cuda,osrt,nvtx --force-overwrite true ./appsink_nvmm
nvdsosd-transparency-demo-main.zip (30.0 KB)
Observe, for every frame processed, when process-mode is set to 1 (gpu-mode) there is a memcpy from device to host and from host to device.
Observe, for every frame processed, when process-mode is set to 0 (cpu-mode) there is a memcpy from device to host twice, and from host to device twice.
This may not be problematic for the example processing720p @ 30 fps, but for an application processing 4K @ 60fps the transfers are 18X larger and this becomes a bottleneck.
Why is nvdsosd transfering memory to and from the host?