Possibly Studpid question bout cudaMemcpy CudaMemcpy getting slow by time

computervision · February 23, 2010, 4:59pm

Hi there,

I’m working on a GTX295 (3.0beta driver) and do something like this:

int *a, *b;

cudaMalloc((void**) &a,  memSize);

cudaMalloc((void**) &b,  memSize);

//some memory initialization on a and b ...

for (int i=0,i<bigvalue;i++) {

 //somecode

  cudaMemcpy( a, b, memSize, cudaMemcpyDeviceToDevice );

 //somecode

}

I do not free memory in the loop nor do I reallocate any memory.

The copy takes longer and longer evry time I run through the loop. Why could this be?

It is starting with about 0.005ms

After about 500 runs it already takes 0.147ms and no end in sight! :(

If my code is not appropriate - please tell what I should post…

gonnet · February 23, 2010, 5:52pm

Hi,

Do you have the same behaviour if you put a cudaThreadSynchronize right after the cudaMemcpy ? (or at the end of the loop so that you overlap some computation). Perhaps you have too much pending transfers ?

Just my 2 cents,
CÃ©dric

computervision · February 26, 2010, 10:46am

Hi Cedric,

yes with a cudaThreadSyncronize after the cudaMemcpy i got the same result…

gonnet · February 26, 2010, 10:50am

I must admit I can’t see how that can become so slow, perhaps you have some small piece of code everyone could try to see if we have the same behaviour and whether this happens ? I’m pretty curious to see what is happening.

CÃ©dric

computervision · February 26, 2010, 7:10pm

Hi Cedric,

thanks (in advance) for your help and interest. I have solved the problem by doing a workaround, that would even be better than the memcopy-thing if memcopy would work fine:

for (int somecounter=0;somecounter<119046;++somecounter){

//do much calculation thats time-measured and working kindof quick.

sharedMemSize = numThreadsPerBlock * sizeof(int);

dim3 dimBlock(numThreadsPerBlock); 

START_TIMER;

cudaMemcpy( d_temp1, d_sum, memSize/64, cudaMemcpyDeviceToDevice ); //GETTING WORSE

for (int s=linescount;s>1;s=s) {

  int numBlocks = s / (2 * numThreadsPerBlock);

  if ((s%(2*numThreadsPerBlock))!=0)

	numBlocks++;

  dim3 dimGrid(numBlocks);

  reduce <<< dimGrid, dimBlock, sharedMemSize >>> (d_temp1, d_temp2, s);

  cudaMemcpy( d_temp1, d_temp2, numBlocks*sizeof(int), cudaMemcpyDeviceToDevice ); //GETTING WORSE

  s = numBlocks;

}

cudaThreadSynchronize();

STOP_TIMER;

PRINT_TIMER("kernel reduce: %f\n",0);

RESET_TIMER;

//do the rest of the calculation

}

The lines marked with “GETTING WORSE” are the ones that are getting slower by time.

Some more information about data sizes and so on:

d_temp1 and d_temp2 are allocated once outside all the loops at the beginning of main()
no data is beeing reallocated ever in any loop
d_temp1 and d_temp2 are never used anywhere else than here
d_sum changes in every outer loop and is getting its data from calculation done by other kernels
the size of d_sum is 11904664sizeof(int)

I cant reproduce this behaviour in a minimal code. So there must be any weird error that I cannot see. Perhaps you find an error in this excerpt.

Buy - Julian

Topic		Replies	Views
About CUDA CUDA Programming and Performance	2	4770	December 3, 2008
How much time is cudaMemcpy() use? CUDA Programming and Performance	1	4067	July 30, 2008
cudaMemcpy too slow CUDA Programming and Performance	1	1162	May 11, 2021
copy memory slow? CUDA Programming and Performance	2	4862	February 12, 2009
Copying memory from device to Host takes too much time CUDA Programming and Performance	7	3479	October 5, 2010
Slow cudaMemcpy execution Tested in GTX480 and GT240 CUDA Programming and Performance	6	2329	April 25, 2012
Memory Transfer CUDA Programming and Performance	7	3061	October 10, 2008
cudaMemcpy slow down CUDA Programming and Performance	2	3488	May 11, 2009
cudaMemcpy sometimes very slow CUDA Programming and Performance	1	1055	May 21, 2018
cudaMemcpy(dataDev, dataHost, mem_size, cudaMemcpyHostToDevice) execution time to long CUDA Programming and Performance	2	6458	January 21, 2010

Possibly Studpid question bout cudaMemcpy CudaMemcpy getting slow by time

Related topics