Nvdsosd plugin profiling shows device-to-host and host-to-device transfers, why?

**• Hardware Platform: RTXA4000
**• DeepStream Version: 6.0
**• TensorRT Version: 8.0.1
**• NVIDIA GPU Driver Version: 470.94
**• Issue Type: NvDsOsd unnecessarily doing DeviceToHost and HostToDevice transfers

Reproduce the issue:

1- Build container

./docker_build.sh

2 - Run container

./docker_run.sh

3- Build application

mkdir appsink_nvmm/build
cd appsink_nvmm/build
cmake ..
make

4 - Run application

./appsink_nvmm

profile the application using:

nsys profile --trace cuda,osrt,nvtx --force-overwrite true ./appsink_nvmm

5 - change process-mode from 1 to 0 in appsink_nvmm.cpp

repeat steps 1 through 4

nvdsosd-transparency-demo-main.zip (30.0 KB)

Observe, for every frame processed, when process-mode is set to 1 (gpu-mode) there is a memcpy from device to host and from host to device.

Observe, for every frame processed, when process-mode is set to 0 (cpu-mode) there is a memcpy from device to host twice, and from host to device twice.

This may not be problematic for the example processing720p @ 30 fps, but for an application processing 4K @ 60fps the transfers are 18X larger and this becomes a bottleneck.

Why is nvdsosd transfering memory to and from the host?

Currently, the “text”, “circle” OSDs can only be drawn by CPU, so there is memory transferring between gpu and host.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.