TK1 freezes on cudaDeviceSynchronize

We’ve posted a stackoverflow question for this issue here:

In short:

I am trying to test two methods of allocating data on a TK1 Tegra: zero-copy and cudaMallocManaged(). I’m allocating, calling a kernel, synchronizing, and freeing in a loop. The problem is that the loop hangs.

When cudaDeviceSynchronize() is commented out, everything finishes. The loop also doesn’t hang if stepped through with gdb. It also does not hang if I allocate on host and device and copy back and forth (as in normal procedure for non-shared memory space).

So this seems like some sort of race condition, and we’re not sure how to proceed.

Thanks in advance.

Hi AaronMS,
Thanks for reporting the issue, we are currently investigating the case and we’ll let you know when we have an update.


Hi AaronMs,
Could you provude a built application that can help to reproduces the issue easily?