hi
When do i need to call ‘cudaThreadSynchronize’ ?
now i have weird ‘bug’ inside my program, something like this works fine:
cudaMemcpy(SomeMemory, …)
cudaThreadSynchronize();
Kernel<<<…>>>(SomeMemory);
… and something like this do not work (‘unknown error’ inside kernel)
cudaMemcpy(SomeMemory, …)
Kernel<<<…>>>(SomeMemory);
do i need to call cudaThreadSynchronize after cudaMemcpy’s and befor kernel lunches ?
(i dont think so, so i’m asking ;))
definitivelly there is NO out of bound aceess - this is double checked … so driver bug ?
(my spec: latest drivers, 64bit app, win vista x64 sp1, 2 x gf 280GTX)
another case is subsequent kernel lunches on same memory:
Kernel_Iterate<<<…>>>(SomeMemory)
Kernel_Iterate<<<…>>>(SomeMemory)
Kernel_Iterate<<<…>>>(SomeMemory)
Kernel_Iterate<<<…>>>(SomeMemory)
do i need to cudaThreadSynchronize between kernels, or they are quered on the driver side and then executed IN ORDER ?
thanks for clarifications.