I don’t have compute-sanitizer in my system but I’ve tried to run it with cuda-memcheck and couldn’t reproduce it, the program gets really slow when running with this prefix
cuda-memcheck is fine, too, but it is deprecated in favor of computer-sanitizer.
It is expected that the program runs very slow with those tools.
If the program always runs without error with cuda-memcheck, this indicates a race condition of some kind, as expected by your comment “But in 1 out of 5 runs of my code I’m getting” . Might be a conflict between multiple cpu threads, might be a conflict between different cuda streams, I could not tell.
When I have this kind of problem in my code, I often add cudaDeviceSynchronize + error checking after each CUDA call. If this solves the issue, I remove cudaDeviceSynchronize until the error reappears.
Check the return code of each cuda call, i.e. cudaMalloc, cudaFree, cudaMemcpy, cudaDeviceSynchronize, etc .
For kernel launches, you need a check of cudaGetLastError followed by checking the return code of cudaDeviceSynchronize.
Be aware that in case of multiple cpu threads a cuda error in one thread may be observed in a different thread.
As you can see in the code attached above there is a call to cudaGetLastError
Which in the case of the crash returns:
cudaGetLastError: an illegal memory access was encountered(700)
I’ve added also a call to cudaDeviceSynchronize(after the call to cudaGetLastError)
and it also returns:
cudaDeviceSynchronize: an illegal memory access was encountered(700)
There are multiple unchecked API calls in your program. I was not talking only about the check after your kernel.
For example, how do you know that buf.InitTexture is successful? If it is not, of course the kernel will fail.
If you have determined that the error originates from your kernel, you could dump the input values of each invocation to file and create a minimal reproducer for your bug.
Regarding checking the API calls, I will try it, but if I will see that there is also an invalid memory access in another location for example in the buf.InitTexture what can I do with it? what’s the reason for this invalid memory access?
Thanks
I’ve tested the return value of initTexture(which checks cudaMalloc return value)
and there wasn’t any error before the crash in the vector_norm_kernel.
Okay. I cannot give you more suggestions. If the error is in the kernel, try to create a minimal executable reproducer which can be used for further debugging.