Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory?

sangjun11316 · June 7, 2023, 1:22am

I am using Geforce 3090 ti, and I was relying on the nvidia-smi or cudaMemGetInfo to check if the memory is properly allocated or not.

  // check memory
  size_t free_byte ;
  size_t total_byte ;
  cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;

  if ( cudaSuccess != cuda_status ){
    printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status) );
    exit(1);
  }

  double free_db = (double)free_byte ;
  double total_db = (double)total_byte ;
  double used_db = total_db - free_db ;
  printf("GPU memory usage: used = %f, free = %f MB, total = %f MB\n",
         used_db/1024.0/1024.0, free_db/1024.0/1024.0, total_db/1024.0/1024.0);

Basically the above lines of code (which is found in this forum) give the identical results with the memory part of the nvidia-smi.

I noticed that these methods give wrong results when one tries to allocate more than the available space (It works fine with the memory under its capacity, but they behaves like some kind of overflow when one tries to allocate more than the available device memory space - they give much small number in the used memory, and much large number in the free memory.). The code does not break with cudaError_t, but they finally break when I try to access, I would say, ‘over-allocated’ memory in some CUDA kernel.

I have naively believed the results and assumed that the memory space is not the cause of my issue, and spent a week inspecting the entirely wrong directions.

Why nvidia-smi, or cudaMemGetInfo are not designed to give error with over-occupied memory? Of course it is easy to notice that the numbers they give are wrong if one is careful enough to pre-estimate the approximate memory space he/she will occupy, but anyway it is a source of confusion.

njuffa · June 7, 2023, 5:00pm

I admit I do not understand what the issue is.

Every time a program makes a dynamic memory allocation, whether with malloc(), cudaMalloc(), cudaHostAlloc(), the very next step is to check that the allocation succeeded, and take appropriate action if it failed. That is a standard programming idiom. Did your code leave out these checks?

After an allocation attempt failed, the amount of total available memory reported by internal APIs or external utilities is going to be identical to before the failed allocation call, because no allocation actually took place.

sangjun11316 · June 8, 2023, 1:18am

I have wrapped cudaMalloc with these error checks, however this didn’t throw error with the attempt to allocate more than the device memory.

void allocateArray(void **devPtr, size_t size) {
  checkCudaErrors(cudaMalloc(devPtr, size));
}

Robert_Crovella · June 8, 2023, 1:27am

I suggest providing a short, complete code that demonstrates what you are doing. If you are attempting to allocate more than the available memory on the GPU via cudaMalloc, and you are not getting a runtime error, either your code is broken, or your CUDA install is broken.

sangjun11316 · June 8, 2023, 4:45am

#include <iostream>
#include <cstdio>
#include "limits.h"

#include <cuda_runtime.h>
#include <helper_cuda.h>

#define WIDTH 256

void allocateArray(void **devPtr, size_t size) {
  checkCudaErrors(cudaMalloc(devPtr, size));
}

int main(int argc, char **argv) {
  double *dArr;
  
  int Length = atoi(argv[1]);
  
  allocateArray((void **)&dArr, Length*WIDTH*sizeof(double));
  
  // check memory
  size_t free_byte ;
  size_t total_byte ;
  cudaError_t cuda_status = cudaMemGetInfo( &free_byte, &total_byte ) ;
  
  if ( cudaSuccess != cuda_status ){
    printf("Error: cudaMemGetInfo fails, %s \n", cudaGetErrorString(cuda_status) );
    exit(1);
  } 
  
  double free_db = (double)free_byte ;
  double total_db = (double)total_byte ;
  double used_db = total_db - free_db ;
  printf("GPU memory usage: used = %f, free = %f MB, total = %f MB\n",
         used_db/1024.0/1024.0, free_db/1024.0/1024.0, total_db/1024.0/1024.0);
         
  cudaFree(dArr);
  
  return 0;
}

This simply allocates an array of double whose Length is multiplied by WIDTH (256).

$ ./exe 5000000
GPU memory usage: used = 10062.750000, free = 14196.937500 MB, total = 24259.687500 MB
$ ./exe 20000000
GPU memory usage: used = 6592.750000, free = 17666.937500 MB, total = 24259.687500 MB

In this case, 20,000,000 is cleary out of bound (I am with 3090 ti) but checkCudaErrors does not thorw anything, and even compute-sanitizer does not complain.

sangjun11316 · June 8, 2023, 4:49am

Ah, now I got it. Indeed this was an overflow, not in the CUDA side but it was on the overflow of integer which took place in Length* WIDTH *sizeof(double).

Sorry for the nuisance.

Robert_Crovella · June 8, 2023, 11:08pm

Topic		Replies	Views
how to effectively free large memory allocation CUDA Programming and Performance	8	7659	November 5, 2015
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	380	January 23, 2024
used memory on device how much memory does device allocate after calling cudamalloc? CUDA Programming and Performance	1	5385	September 8, 2009
cudaMemGetInfo returning wrong amounts of free memory CUDA Programming and Performance	6	1517	August 7, 2019
cudaMemGetInfo() how does it work?!? CUDA Programming and Performance	5	10690	December 12, 2023
Different CUDA memory usage between nvidia-smi and cudaMemGetInfo CUDA Programming and Performance	0	1239	September 19, 2019
Incorrect total memory reported by cudaMemGetInfo CUDA Programming and Performance	8	6592	June 11, 2012
Question about GPU Memory Overhead with Cudamallocmanaged CUDA Programming and Performance	7	1011	August 21, 2024
cudaMemGetInfo free mem value is not correct CUDA Programming and Performance	1	1055	September 9, 2018
Question about cudaMalloc Behavior When Exceeding Physical VRAM on GTX 1070 CUDA Programming and Performance	1	40	December 26, 2024

Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory?

Related topics