I compile this with CUDA toolkit 4.2.9 to binary “noop” and run it on a machine with NVIDIA driver 295.49 through valgrind, using the following command:
$ valgrind --leak-check=full ./noop
==7962== Memcheck, a memory error detector
==7962== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==7962== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
==7962== Command: ./empty
==7962==
==7962==
==7962== HEAP SUMMARY:
==7962== in use at exit: 30,053 bytes in 47 blocks
==7962== total heap usage: 2,389 allocs, 2,342 frees, 898,204 bytes allocated
==7962==
==7962== 16 bytes in 1 blocks are definitely lost in loss record 4 of 33
==7962== at 0x4C2596C: operator new(unsigned long) (vg_replace_malloc.c:220)
==7962== by 0x4E45CD2: ??? (in /usr/local/lib/libcudart.so.4.2.9)
==7962== by 0x4E7406A: ??? (in /usr/local/lib/libcudart.so.4.2.9)
==7962== by 0x4E54A1F: ??? (in /usr/local/lib/libcudart.so.4.2.9)
==7962== by 0x4E3043E: ??? (in /usr/local/lib/libcudart.so.4.2.9)
==7962== by 0x4E74360: ??? (in /usr/local/lib/libcudart.so.4.2.9)
==7962== by 0x5869C11: exit (exit.c:78)
==7962== by 0x584FAC3: (below main) (libc-start.c:252)
==7962==
==7962== LEAK SUMMARY:
==7962== definitely lost: 16 bytes in 1 blocks
==7962== indirectly lost: 0 bytes in 0 blocks
==7962== possibly lost: 0 bytes in 0 blocks
==7962== still reachable: 30,037 bytes in 46 blocks
==7962== suppressed: 0 bytes in 0 blocks
==7962== Reachable blocks (those to which a pointer was found) are not shown.
==7962== To see them, rerun with: --leak-check=full --show-reachable=yes
==7962==
==7962== For counts of detected and suppressed errors, rerun with: -v
==7962== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 10 from 7)
(I noticed that the cudaDeviceReset() was necessay for getting rid of some “possibly lost” bytes, that’s why it is there, quite unconventionally.)
Is there a memory leak in libcudart, or am I responsible for cleaning something up that I miss?