I found that the memory usage reported by NVML is different from the memory I allocated with cudaMalloc in my program. I’ve done simple accounting in my program, and found that the usage reported by NVML is ~300MB more than my accounting result. So I read the API Reference of nvmlMemory_t Struct.
Memory allocation information for a device.
unsigned long long free
Unallocated FB memory (in bytes).
unsigned long long total
Total installed FB memory (in bytes).
unsigned long long used
Allocated FB memory (in bytes). Note that the driver/GPU always sets aside a small amount of memory for bookkeeping.
And what I care is “used”. According to the definition of “used”, gpu will set aside a small amount of memory for bookkeeping. I want to know is this “small amount” fixed. Or it has some rules to determine this “small amount”?