CUDA memcheck 1.x

wlangdon · January 7, 2014, 12:22pm

Does anyone know how memcheck works? I get the impression from the user manual that it catches
exceptions generated by the GPU hardware and reports them to the host PC. The manual says it
will catch addressing errors with all types of memory but is there anything it might miss?
I am guessing it will not catch A[100] if A contains 100 elements but lies next to array B??
If it does rely on the hardware, will there be problems with older GPUs, in particular
compute level 1.x (eg 1.3)?
With CUDA 5.0, I have had unexplained problems with mangelled code compiled with sm_13 and run on compute
level 1.3 that did not occur with sm_20/compute level 2.0 and later. I was wondering if
memcheck does a better job with later hardware?

As always any help or advice would be most welcome

Thank you
Bill

njuffa · January 7, 2014, 6:16pm

I can’t provide any specifics since I am not tuned into the details of the debugger implementation, but I am aware that in general, the capabilities of the debugger and cuda-memcheck do increase with later architectures due to improved hardware support (“hooks”). If I recall correctly this applies to both bounds checking and race checking in cuda-memcheck.

In practical terms it therefore seems advisable to use cuda-memcheck with a GPU of the highest compute capability that is available.

wlangdon · January 8, 2014, 7:35am

Thank you njuffa.
Do you or anyone else know how to find more about memcheck?
Also does anyone know of alternatives when using 1.x level GPUs?
Many thanks
Bill

vyas · January 8, 2014, 7:02pm

At a low level, cuda-memcheck works by modifying the binary that is executed on the GPU. On Fermi (SM 2.x) and higher hardware, there is support in the hardware for error reporting that cuda-memcheck relies on. In addition, architectural differences between Tesla and Fermi+ architectures affect some memcheck functionality.

For your specific questions:

“I am guessing it will not catch A[100] if A contains 100 elements but lies next to array B??”
Some class of such errors may be missed based on the compiler’s allocation mechanism. If the allocations happen to be contiguous addresses, then yes, it is possible for an access that was intended for A to validly access B, and memcheck will not generate an error in such a case.

“If it does rely on the hardware, will there be problems with older GPUs, in particular
compute level 1.x (eg 1.3)?”
Yes. Cuda-memcheck’s detection and reporting mechanisms are significantly better on Fermi(2.0) and higher GPUs. Njuffa’s comment about using cuda-memcheck on the GPU with the highest available compute capability is good advice.

wlangdon · January 9, 2014, 10:45am

Many thanks
Bill

Topic		Replies	Views
Getting around apparent CUDA bugs CUDA Programming and Performance	5	966	September 20, 2011
Why moving code from card with computability 1.x to 2.0 fails? allocation memory fails on Tesla card CUDA Programming and Performance	8	1198	March 3, 2012
Problem with arch=sm_20 CUDA Programming and Performance	16	4231	March 4, 2011
Cuda Debugging CUDA Programming and Performance	3	809	January 31, 2016
cuda-memcheck : windows + cublas CUDA-MEMCHECK	2	1277	November 12, 2018
Runtime trouble moving legacy code from CUDA 6.5 to 8.0 CUDA Programming and Performance	8	593	September 3, 2021
Memory Checker detected 12 access violations. error = access violation on load (global memory) CUDA Programming and Performance	6	1582	January 21, 2018
same code different results on Quadro 3000M and Tesla C1060 CUDA Programming and Performance	9	1766	January 18, 2013
Strange behaviour on GTX295 Random data changes in GPU memory CUDA Programming and Performance	7	7390	September 11, 2011
strange behavior with device emulation CUDA Programming and Performance	5	2693	May 20, 2008

CUDA memcheck 1.x

Related topics