Apologize for the generality of the question, I’m new to GPU/CUDA programming.
I have an application with two kernels (call them k1 and k2).
I transfer data to the device, run K1 N times and transfer data out of the device.
all is well.
If I transfer data in, run K1 N times, then run K2 once (and it hardly does anything)
then when i try to transfer data out i get ULF on the memcpy device to host.
My question is: How do i start debugging this?
What conditions cause a data transfer to ULF?