too much global memory occupication

hulin · January 10, 2018, 12:47pm

the code is as follows:

#include<stdlib.h>
#include<stdio.h>
#include<cuda_runtime.h>

int main()
{
	int * a;
	cudaMalloc(&a,sizeof(int));
	while(1){
	}
	return 0;
}

I only allocate 4 bytes in global memory, but in fact, when I show gpu usage using “nvidia-smi” command,174M memory is used. I can’t find out why… …

tera · January 10, 2018, 9:43pm

Apart from the 4 bytes for holding an integer value, probably rounded up to a multiple of the GPU’s page size, any running CUDA code also occupies GPU memory for a context.

The context holds various important data structures like the heap for device-side memory allocations, stack space for all (tens of thousands of) threads that can potentially run in parallel, the FIFO that buffers device side printf() output, space to store the entire internal state of the GPU two times (by default) over in case cudaDeviceSynchronize() is called on the device, and lots of other documented and undocumented stuff necessary to make CUDA work. Some of these data structures are configurable in size by calls to cudaDeviceSetLimit(), so if you know you are not using a feature you can (somewhat) reduce the amount of memory required.

hulin · January 11, 2018, 6:56am

Thank you !!!

Apart from the 4 bytes for holding an integer value, probably rounded up to a multiple of the GPU’s page size, any running CUDA code also occupies GPU memory for a context.

The context holds various important data structures like the heap for device-side memory allocations, stack space for all (tens of thousands of) threads that can potentially run in parallel, the FIFO that buffers device side printf() output, space to store the entire internal state of the GPU two times (by default) over in case cudaDeviceSynchronize() is called on the device, and lots of other documented and undocumented stuff necessary to make CUDA work. Some of these data structures are configurable in size by calls to cudaDeviceSetLimit(), so if you know you are not using a feature you can (somewhat) reduce the amount of memory required.

najy76 · February 3, 2020, 3:46pm

Is there any way to compute or predict the size of a CUDA context from the driver API, without allocating one and measuring its occupacy with NVML stuff?

ryork · February 3, 2020, 5:27pm

There is an occupancy calculator spreadsheet available in the \Program Files\Nvidia GPU Computing Toolkit\CUDA\v10.1\tools directory. That might help.

najy76 · February 5, 2020, 11:40am

The CUDA Occupancy calculator spreadsheet computes the multiprocessor occupancy of a given CUDA kernel.

What I am talking about is the amount of memory occupied by the single CUDA context. Take for example this simple code:

#include <cuda.h>
int main() {
  CUcontext ctx;
  CUdevice device = 0;

  cuInit(0);
  cuDeviceGet(&device, 0);
  cuCtxCreate(&ctx, CU_CTX_SCHED_AUTO, device);
}

If you check the code with nvidia-smi on a V100, this would consume 305MB of memory. On a P100 takes approximately 280MB. When dealing with many contexts or loading different GPU accelerated applications on a compute node, the sum of these context can become important in estimating how many instances of such applications can fit into the compute node concurrently. That’s way it is important to have a good estimate of such occupation before submitting the jobs.

So I ask again: is there any way or API to know how much memory will be consumed by a context?

Robert_Crovella · February 5, 2020, 12:59pm

There is no API.

You were given a basically empirical proposal in the comments here:

https://stackoverflow.com/questions/60041674/cuda-context-default-size

Such empirical methods could change with CUDA version, GPU type, or the phase of the moon.

Topic		Replies	Views
Determine Memory CUDA Context Memory Usage CUDA Programming and Performance	16	10656	March 9, 2019
Global memory usage profiling and tracking Visual Profiler and nvprof cuda , profiling	9	1518	February 1, 2024
Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory? CUDA Programming and Performance cuda	6	562	June 8, 2023
Memory usage values in nvidia-smi command CUDA Programming and Performance	4	1646	November 21, 2023
Predictable? how much device memory per device context creation. CUDA Programming and Performance	6	1900	March 31, 2016
Does cudasetdevice() allocate memory ？ CUDA Programming and Performance	2	121	July 5, 2024
Why is ~300 MiB of GPU RAM used by "nothing"? CUDA Programming and Performance	8	1691	February 22, 2018
What's CUDA-context's GpuMemory contain? Is it necessary and available to Minimum the CUDA-context's GpuMemorySize? CUDA Programming and Performance	6	269	April 15, 2024
Unified Memory: nvidia-smi "Memory Usage" interpretation CUDA Programming and Performance cuda	6	14444	June 27, 2023
Device Memory Mangement CUDA Programming and Performance	14	3457	December 5, 2008

too much global memory occupication

Related topics