cudaMalloc()

zchen22 · October 9, 2013, 8:10pm

Hello,

I am using CUDA to accelerate a simple double-precision floating-point matrix algorithm, which does some mathematical calculations on each element of the matrix. The code works well with small or medium size matrices; however, the performance degradation happens when I use very large matrices which takes up nearly all the device memory.

I profiled the code using nsight, and found that the system memory (host memory), besides device memory, was used for some buffers allocated by cudaMalloc(). For example, when I tried to allocate 1.7 GB device memory using cudaMalloc() calls, the actual memory allocation is 850 MB system memory and 850 MB device memory. This is weird and confusing to me since I expected cudaMalloc() allocated all the buffers on the device memory. It also hurts performance significantly. I am using CUDA 5.5 on Windows 8.1 64-bit. My card is GeForce GTX 680.

So does anyone have similar experience? I would like to know why it happens, and if possible, how to avoid it. I would greatly appreciate if you could help me out!

Zhongliang

Topic		Replies	Views
Using cudaHostAlloc CUDA Programming and Performance	0	6492	May 9, 2011
How do I increase the VRAM capacity programmatically? CUDA Programming and Performance	4	1998	October 12, 2021
Using memory and calculation problem on GPU CUDA Programming and Performance	1	846	May 10, 2013
Dose cudaMalloc() increase host memory? CUDA Programming and Performance	0	694	February 28, 2014
cudaMalloc fails on huge allocation CUDA Programming and Performance	4	790	March 28, 2011
Is cudaMalloc slow when called multiple times? CUDA Programming and Performance	3	152	July 5, 2024
cudaMalloc Limit CUDA Programming and Performance	2	2757	July 17, 2008
Is there a difference between memory allocation methods? CUDA Programming and Performance	0	700	March 12, 2009
GPU Allocating memory Memory allocation on GPU CUDA Programming and Performance	2	4651	April 23, 2009
Memory on DRAM CUDA Programming and Performance	6	2473	April 28, 2012

cudaMalloc()

Related topics