CUDA memcheck 1.x

Does anyone know how memcheck works? I get the impression from the user manual that it catches
exceptions generated by the GPU hardware and reports them to the host PC. The manual says it
will catch addressing errors with all types of memory but is there anything it might miss?
I am guessing it will not catch A[100] if A contains 100 elements but lies next to array B??
If it does rely on the hardware, will there be problems with older GPUs, in particular
compute level 1.x (eg 1.3)?
With CUDA 5.0, I have had unexplained problems with mangelled code compiled with sm_13 and run on compute
level 1.3 that did not occur with sm_20/compute level 2.0 and later. I was wondering if
memcheck does a better job with later hardware?

As always any help or advice would be most welcome

Thank you
Bill
http://www.cs.ucl.ac.uk/staff/W.Langdon/

I can’t provide any specifics since I am not tuned into the details of the debugger implementation, but I am aware that in general, the capabilities of the debugger and cuda-memcheck do increase with later architectures due to improved hardware support (“hooks”). If I recall correctly this applies to both bounds checking and race checking in cuda-memcheck.

In practical terms it therefore seems advisable to use cuda-memcheck with a GPU of the highest compute capability that is available.

Thank you njuffa.
Do you or anyone else know how to find more about memcheck?
Also does anyone know of alternatives when using 1.x level GPUs?
Many thanks
Bill

At a low level, cuda-memcheck works by modifying the binary that is executed on the GPU. On Fermi (SM 2.x) and higher hardware, there is support in the hardware for error reporting that cuda-memcheck relies on. In addition, architectural differences between Tesla and Fermi+ architectures affect some memcheck functionality.

For your specific questions:

“I am guessing it will not catch A[100] if A contains 100 elements but lies next to array B??”
Some class of such errors may be missed based on the compiler’s allocation mechanism. If the allocations happen to be contiguous addresses, then yes, it is possible for an access that was intended for A to validly access B, and memcheck will not generate an error in such a case.

“If it does rely on the hardware, will there be problems with older GPUs, in particular
compute level 1.x (eg 1.3)?”
Yes. Cuda-memcheck’s detection and reporting mechanisms are significantly better on Fermi(2.0) and higher GPUs. Njuffa’s comment about using cuda-memcheck on the GPU with the highest available compute capability is good advice.

Many thanks
Bill