I have an application which both utilizes bare CUDA kernels, OpenCV CUDA API, and VisionWorks. I have enabled --default-stream per-thread flag for CUDA and built OpenCV CUDA with this flag as well.
Some of the VisionWorks function calls are executed in non-legacy stream. However, some of the VisionWorks calls such as vxMapArrayRange() seem to perform its operations on legacy default stream. When I call these functions, I see in the profiler that DeviceToHost data transfers are performed in the legacy default stream.
Is there a way to bypass this issue? For example, can I transfer a vxArray to Host by issuing a memcpy call to a non-legacy stream?