Questions about cudaMalloc Questions about runtime for cudaMalloc and cudaMemcpy

MsFromMs · June 23, 2009, 8:40am

Hi,

I’m currently working on a multi gpu cuda programm. So i have up to 4 devices (and so 4 host threads) running the same kernel on different data.

My first step after launching the host threads is to allocate some device memory using multiple cudaMalloc calls. After that i copy data to the device using cudaMemcpy.

I measured the time for the cudaMalloc calls and the time for the cudaMemcpy calls.

I have two questions about this:

[list=1]

[*]Why are the cudaMalloc calls significant slower than the cudaMemcpy calls ? The cudaMalloc calls take about 15 times longer then the cudaMemcpy calls, if i work with only one device. (For multiple devices it get even worse)

[*]If i work with multiple devices the cudaMalloc times gets worse. I got the following times for the cudaMalloc calls:

1 GPU: 0.36240400 seconds

2 GPUs: 0.70018800 seconds

4 GPUs: 1.16176900 seconds

So my question is: Are the cudaMalloc calls synchronized over multiple host threads or what is a possible reason for this times?

Hope to get some answers. Greetings from Germany. Michel

avidday · June 23, 2009, 11:01am

I am going to guess that you are using windows (maybe Vista)? What I am guessing you are seeing is the overhead associated with establishing context with each GPU (looks like about 300ms per GPU). I would also guess that subsequent mallocs will be much, much faster, it is just the first operation on each context which is slow.

But all of this is just a wild guess. If it is any help, that doesn’t happen in any of the Linux versions of CUDA I have tried.

Topic		Replies	Views
cudaMalloc's taking different times CUDA Programming and Performance	3	1979	December 22, 2010
Is cudaMalloc slow when called multiple times? CUDA Programming and Performance	3	234	July 5, 2024
Why does cudaMalloc time depends on kernel calling? cudaMalloc takes more time if you call a kernel CUDA Programming and Performance	3	11911	August 31, 2009
Memory Allocation Time Takes too much time!! CUDA Programming and Performance	3	4667	August 28, 2009
Why is the execution time of cudaMalloc so variable (when using hotspot benchmark from Rodinia Benchmark suite)? CUDA Programming and Performance	5	358	October 26, 2025
cudaMalloc problems CUDA Programming and Performance	3	2315	April 24, 2008
cudaMalloc takes several seconds CUDA Programming and Performance	6	2618	August 13, 2013
cudaMalloc, cudaFree speed CUDA Programming and Performance	2	3659	April 4, 2013
Help regarding slow cudaMalloc CUDA Programming and Performance	9	9990	November 29, 2008
About CUDA CUDA Programming and Performance	2	4770	December 3, 2008

Questions about cudaMalloc Questions about runtime for cudaMalloc and cudaMemcpy

Related topics