We’ve posted a stackoverflow question for this issue here:
I am trying to test two methods of allocating data on a TK1 Tegra: zero-copy and cudaMallocManaged(). I’m allocating, calling a kernel, synchronizing, and freeing in a loop. The problem is that the loop hangs.
When cudaDeviceSynchronize() is commented out, everything finishes. The loop also doesn’t hang if stepped through with gdb. It also does not hang if I allocate on host and device and copy back and forth (as in normal procedure for non-shared memory space).
So this seems like some sort of race condition, and we’re not sure how to proceed.
Thanks in advance.