I am using computer-sanitizer --tool memcheck [MyAppName] to detect the error sources. Strangely, there is no Memcheck error after the whole scan whilst I can get illegal memory access by using cudaGetLastError(). Is there something i miss from the command line?
Without an example to look at, one can only speculate. For example, your code could contain the use of uninitialized data. In different environments, e.g. running with and without
compute-sanitizer, this uninitialized data may take different values, which may cause different addresses to be computed, only some of which cause access to data outside existing memory allocations. You could give
--tool initcheck a try, but I am not sure what kind of errors it is able to find.
While tools like
compute-sanitizer can be a great help, it is best to think of them as complements to, and not replacements for, other software engineering methods, such as code reviews.
@njuffa i think that the memcheck keeps running now on my program. On the first time, it executes my kernel functions without any issue until on a particular one (where an illegal address undoubtly occurs) then it becomes stuck from there for at least one hour. Is it normal?
codes can run slower under compute-sanitizer
I agree that without an example its just speculation as to what may be happening.
You might wish to try the latest version of CUDA, to see if the problem persists.
I have the CUDA 11.7 version. I used to run cuda-memcheck for the same kernel functions in the past but it did not take much time as now. I understand that the program is slower when it comes to do memcheck. I just notified compute-sanitizer takes much longer.
@Robert_Crovella I have a very long cuda program so I am going to give an example below :
MyKernel <<<>>>(); // The kernel which is giving illegal memory access.
exit(0); // I used this command in order to kill the program then get the errors results from compute-sanitizer.
Sorry, that isn’t useful to me. Perhaps others will be able to help.
I wouldn’t be able to offer any further suggestions without a complete example. Yes, I understand that might require considerable effort on your part.
@Robert_Crovella I have new stuff for you. By placing cudaGetLastError() right after the faulty kernel, i got “the launch timed out and was terminated”. Apparently, there was timeout but compute-sanitize did not finish yet scanning my kernel.
It’s always a good idea to do rigorous, proper CUDA error checking. Particularly when you are having trouble with a CUDA code. I usually advise that people do that before they ask for help on forums.