Please provide complete information as applicable to your setup.
**• Hardware Platform (Jetson / GPU): GPU
**• DeepStream Version: 7.0 • JetPack Version (valid for Jetson only) • TensorRT Version
**• NVIDIA GPU Driver Version (valid for GPU only): L4 • Issue Type( questions, new requirements, bugs)
**• How to reproduce the issue ?
We noticed that in Triton Inference Server, no CUDA kernels are running while a Device-to-Device (D2D) memory copy is taking place. However, in DeepStream, CUDA kernels can run concurrently with Device-to-Device memory transfers. Why does this difference occur between Triton and DeepStream?
Since triton is oepnsource, you can check the code. could you elaborate on “CUDA kernels can run concurrently with Device-to-Device memory transfers”? what kind of CUDA kernels? In Which DeepStrem did you observe this?