Compute and Data transfer not happening concurrently in cuda Streams on Iteration 2

cross posting: gpu - Compute and Data transfer not happening concurrently in cuda Streams on Iteration 2 - Stack Overflow

it may be this: Persistent Kernel does not work properly on some GPUs - #5 by Robert_Crovella

1 Like