in my simple image processing consisting of:
- memory copy from CPU to GPU
- CUDA kernel execution
- memory copy from GPU to CPU
i observe regulaty peaks in execution time causing latencies of my image processing. I figured out that it comes from GPU to CPU memory transfer. When using page locked CPU memory the variation in execution times is much lower. It looks like there is an internal CUDA thread frequently running aprx. 100 ms which delays my execution for 5-30 ms. Is there a possibility to control that hidden CUDA thread?
I’m using Jetson TX2 with ubuntu and CUDA 10.2
Thanks in advance