I’m still learning and took me 20 minutes to figure out why I get execution error under debugger:
Cuda error: vecSum Kernel execution failed in file ‘vector_reduction.cu’ in line 99 : unspecified launch failure.
By my mistake I called kernel on a pointer to a host array instead of device array.
Since every device array needs to be mapped/tagged by cudaMaloc prior kernel invocation it should not be hard for compiler to spot such mistake (and shorten my learning curve)
Is there perhaps an option in nvcc compiler to force such cross check?