We wrote a program with OpenAcc and OpenMPI, and wanted to check possible GPU memory leak.
In this program, CUDA functions like cuMemAlloc, cuPointerGetAttribute etc are loaded dynamically with dlsym:dlopen.
Program was compiled with
CFLAGS="-O0 -g -I${CUDA_ROOT}/include -acc=gpu -gpu=cc80,nordc,debug -Minfo"
LDFLAGS="-g -lnvToolsExt -acc=gpu -gpu=cc80,nordc,debug"
srun -N 1 -n 1 compute-sanitizer --tool memcheck --leak-check full exe
got the following warning and error messages:
Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid device context" on CUDA API call to cuCtxGetDevice.
Program hit CUDA_ERROR_INVALID_VALUE (error 1) due to "invalid argument" on CUDA API call to cuPointerGetAttribute.
[l50121:3175092:0:3175092] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1e)
1 ==== backtrace (tid:3175092) ====
2 0 0x0000000000012c20 .annobin_sigaction.c() sigaction.c:0
3 1 0x00000000001814b1 optixQueryFunctionTable() ???:0
4 2 0x0000000000170cf0 optixQueryFunctionTable() ???:0
5 3 0x000000000012af28 InitializeInjectionNvtxExtension() ???:0
6 4 0x000000000012afee InitializeInjectionNvtxExtension() ???:0
7 5 0x00000000000ab1ed ???() cuda-11.7.0-u4znfi/compute-sanitizer/libsanitizer-collection.so:0
8 6 0x00000000000c4166 ???() cuda-11.7.0-u4znfi/compute-sanitizer/libsanitizer-collection.so:0
9 7 0x0000000000134edb InitializeInjectionNvtxExtension() ???:0
10 8 0x000000000003a037 __cxa_finalize() ???:0
11 9 0x00000000000a8343 ???() cuda-11.7.0-u4znfi/compute-sanitizer/libsanitizer-collection.so:0
12 =================================
13 ========= Error: process didn't terminate successfully
14 ========= Target application returned an error
15 ========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
16 ========= ERROR SUMMARY: 451 errors
17 srun: error task 0: Exited with exit code 11
srun -N 1 -n 1 exe
was successful.
From another post, I know CUDA_ERROR_INVALID_CONTEXT
can be ignored in OpenACC. How to deal with the error in sanitizer collection part?