I am using Geforce 3090 ti, and I was relying on the nvidia-smi
or cudaMemGetInfo
to check if the memory is properly allocated or not.
// check memory
size_t free_byte ;
size_t total_byte ;
cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;
if ( cudaSuccess != cuda_status ){
printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status) );
exit(1);
}
double free_db = (double)free_byte ;
double total_db = (double)total_byte ;
double used_db = total_db - free_db ;
printf("GPU memory usage: used = %f, free = %f MB, total = %f MB\n",
used_db/1024.0/1024.0, free_db/1024.0/1024.0, total_db/1024.0/1024.0);
Basically the above lines of code (which is found in this forum) give the identical results with the memory part of the nvidia-smi
.
I noticed that these methods give wrong results when one tries to allocate more than the available space (It works fine with the memory under its capacity, but they behaves like some kind of overflow when one tries to allocate more than the available device memory space - they give much small number in the used
memory, and much large number in the free
memory.). The code does not break with cudaError_t
, but they finally break when I try to access, I would say, ‘over-allocated’ memory in some CUDA kernel.
I have naively believed the results and assumed that the memory space is not the cause of my issue, and spent a week inspecting the entirely wrong directions.
Why nvidia-smi
, or cudaMemGetInfo
are not designed to give error with over-occupied memory? Of course it is easy to notice that the numbers they give are wrong if one is careful enough to pre-estimate the approximate memory space he/she will occupy, but anyway it is a source of confusion.
I admit I do not understand what the issue is.
Every time a program makes a dynamic memory allocation, whether with malloc()
, cudaMalloc()
, cudaHostAlloc()
, the very next step is to check that the allocation succeeded, and take appropriate action if it failed. That is a standard programming idiom. Did your code leave out these checks?
After an allocation attempt failed, the amount of total available memory reported by internal APIs or external utilities is going to be identical to before the failed allocation call, because no allocation actually took place.
I have wrapped cudaMalloc
with these error checks, however this didn’t throw error with the attempt to allocate more than the device memory.
void allocateArray(void **devPtr, size_t size) {
checkCudaErrors(cudaMalloc(devPtr, size));
}
I suggest providing a short, complete code that demonstrates what you are doing. If you are attempting to allocate more than the available memory on the GPU via cudaMalloc
, and you are not getting a runtime error, either your code is broken, or your CUDA install is broken.
#include <iostream>
#include <cstdio>
#include "limits.h"
#include <cuda_runtime.h>
#include <helper_cuda.h>
#define WIDTH 256
void allocateArray(void **devPtr, size_t size) {
checkCudaErrors(cudaMalloc(devPtr, size));
}
int main(int argc, char **argv) {
double *dArr;
int Length = atoi(argv[1]);
allocateArray((void **)&dArr, Length*WIDTH*sizeof(double));
// check memory
size_t free_byte ;
size_t total_byte ;
cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;
if ( cudaSuccess != cuda_status ){
printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status) );
exit(1);
}
double free_db = (double)free_byte ;
double total_db = (double)total_byte ;
double used_db = total_db - free_db ;
printf("GPU memory usage: used = %f, free = %f MB, total = %f MB\n",
used_db/1024.0/1024.0, free_db/1024.0/1024.0, total_db/1024.0/1024.0);
cudaFree(dArr);
return 0;
}
This simply allocates an array of double whose Length
is multiplied by WIDTH
(256).
$ ./exe 5000000
GPU memory usage: used = 10062.750000, free = 14196.937500 MB, total = 24259.687500 MB
$ ./exe 20000000
GPU memory usage: used = 6592.750000, free = 17666.937500 MB, total = 24259.687500 MB
In this case, 20,000,000 is cleary out of bound (I am with 3090 ti) but checkCudaErrors does not thorw anything, and even compute-sanitizer does not complain.
Ah, now I got it. Indeed this was an overflow, not in the CUDA side but it was on the overflow of integer which took place in Length* WIDTH *sizeof(double).
Sorry for the nuisance.