What info can I extract about use of device memory?

chrismc · March 16, 2009, 9:11am

I’ve been doing some simple experiments on the GPUs this weekend and found that

when I try to allocate memory for 5 million particles per GPU the compiler tells me there is insufficient memory.
when I try to allocate memory for 1 million particles per GPU the compiler does not complain, but when I try to execute a very simple kernel I get an unspecified kernel launch failure which I suspect is due to not enough resources/memory on the GPU
when I try to do the same for 750000 particles as for 1 million the same occurs as.
when I try to do the same for 700000 particles as for 1 million the compiler does not complain and the very simple kernel will execute.

It appears that memory requirement is not linear on the number of particles. Why is that? When using cuMemGetInfo for smaller numbers of particles it suggested that 5 million particles could be placed on each GPU, ignoring kernels.

And how much memory besides the data is required to execute a kernel?

And how much memory does a kernel require and cannot this be calculated at compile time?

How can I get hold of this info or calculate it myself?

Ojiisan · March 16, 2009, 1:59pm

The GPU (or CUDA) has very strict memory requirements. This means that for example allocating 16 times 1 char (16 bytes) will not be the same as allocating 1 times 16 chars (16 bytes). I found out this the hard way. I tried to allocate some small amount of memory about 100.000 times not even using 50MB and received out-of-memory errors with a card that thas 512MB. So try to be smart, allocate in large chunks :)

Topic		Replies	Views
"too many resources requested for launch." - on second launch of a kernel CUDA Programming and Performance	5	2192	March 8, 2018
Causes of unspecified launch failure CUDA Programming and Performance	8	8874	July 9, 2009
show sizes of GPU memory usage, eg log cudaMalloc, CUDA reports "out of memory" at runtime CUDA Programming and Performance	4	2143	December 13, 2016
GPU Allocating memory Memory allocation on GPU CUDA Programming and Performance	2	4662	April 23, 2009
Memory on DRAM CUDA Programming and Performance	6	2476	April 28, 2012
Accurately determining available global memory on a CUDA device CUDA Programming and Performance	2	14398	April 11, 2011
Occupancy and memory CUDA Programming and Performance	3	1544	March 25, 2010
Cuda kernels using unified memory fail with many blocks (but works with few) CUDA Programming and Performance	4	424	July 31, 2018
Maximum number of instruction inside a Kernel CUDA Programming and Performance	9	2814	October 13, 2009
Code does not run with larger file CUDA Programming and Performance	2	870	October 17, 2017

What info can I extract about use of device memory?

Related topics