I have a question about checking the success/fail of my kernel launches. I will use the SDK vectorAdd example:
The original code (line 116 in file vectorAdd.cu):
// Invoke kernel int threadsPerBlock = 256; int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N); getLastCudaError("kernel launch failure"); #ifdef _DEBUG checkCudaErrors( cudaDeviceSynchronize() ); #endif
From my understanding, after kernel ‘VecAdd’, they are using two different functions to check for error. One is cudaGetLastError() and the other is cudaDeviceSynchronize(). The first checks for synchronous errors and the second for asynchronous.
Is this correct?
Is there a cleaner way to check for errors without calling two different CUDA functions?