DATA tranfer from CPU to GPU

Quoc_Vinh · April 22, 2008, 7:17am

I have a problem about data tranfer from CPU to GPU
I copy 496kB it take 0.2156455 milisecond
I copy 768kB it take 0.2149359 milisecond
I copy 1.4MB it take 0.2054923 milisecond
I copy 4.76MB it take 0.2105404 milisecond

why why??? it take the same time. External Image
can anybody tell me what is this problem?
I write main() function on *.cpp file and call this function from *.cu file

this is my code

//start timer copy data tu cpu-gpu
unsigned int timer = 0;
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

//copy data from CPU to GPU
unsigned char* zero_gpu;
int Main_size = sizeof(unsigned char)mainzero_colmainzero_row;
cudaMalloc((void**)&zero_gpu,Main_size);
cudaMemcpy(zero_gpu, zero_cpu, Main_size, cudaMemcpyHostToDevice);

// stop and destroy timer
CUT_SAFE_CALL(cutStopTimer(timer));
printf(“copy CPU - GPU time: %f (ms) \n”, cutGetTimerValue(timer));
CUT_SAFE_CALL(cutDeleteTimer(timer));

when I copy back data from GPU to CPU
it very quickly

I copy 496kB it take 0.0090532 milisecond
I copy 768kB it take 0.0165901 milisecond
I copy 1.4MB it take 0.031168 milisecond
I copy 4.76MB it take 0.1980835 milisecond

jordyvaneijk · April 22, 2008, 9:46am

I have a problem about data tranfer from CPU to GPU

I copy 496kB it take 0.2156455 milisecond

I copy 768kB it take 0.2149359 milisecond

I copy 1.4MB it take 0.2054923 milisecond

I copy 4.76MB it take 0.2105404 milisecond

why why??? it take the same time. External Media

can anybody tell me what is this problem?

I write main() function on *.cpp file and call this function from *.cu file

this is my code

//start timer copy data tu cpu-gpu

unsigned int timer = 0;

CUT_SAFE_CALL(cutCreateTimer(&timer));

CUT_SAFE_CALL(cutStartTimer(timer));

//copy data from CPU to GPU

unsigned char* zero_gpu;

int Main_size = sizeof(unsigned char)mainzero_colmainzero_row;

cudaMalloc((void**)&zero_gpu,Main_size);

cudaMemcpy(zero_gpu, zero_cpu, Main_size, cudaMemcpyHostToDevice);

// stop and destroy timer

CUT_SAFE_CALL(cutStopTimer(timer));

printf(“copy CPU - GPU time: %f (ms) \n”, cutGetTimerValue(timer));

CUT_SAFE_CALL(cutDeleteTimer(timer));

when I copy back data from GPU to CPU

it very quickly

I copy 496kB it take 0.0090532 milisecond

I copy 768kB it take 0.0165901 milisecond

I copy 1.4MB it take 0.031168 milisecond

I copy 4.76MB it take 0.1980835 milisecond

[snapback]366480[/snapback]

I’m not sure but try to do a threadsynchronize, I thought I had same problem with that. And why are you so upset about those timings? Taken in mind that you also have an overhead to make connection with the GPU for the first time?

it is only 200 microseconds?

Quoc_Vinh · April 22, 2008, 10:05am

Thank you very much

i had used it

when you measure the time on GPU it look ok

but when measure the time on CPU it different

I mean that

in main() function before i call the cuda function

I start a timer for measure (call CPU_timer)

in the cuda function I start another timer (call GPU_timer)

but the result of two timers is very different

my code. in *.cpp

int main()

{

//do some thing here

clock_t start;

clock_t stop;

start = clock () + CLOCKS_PER_SEC ;

data_transfer(tem_0,tem_col,tem_row);//call cuda function

stop = clock () + CLOCKS_PER_SEC ;

double duration = (double)(stop - start) / CLOCKS_PER_SEC;

printf( “%2.6f seconds\n”, duration );

}

my code in *.cu

void data_tranfer( unsigned char *zero_cpu,int mainzero_col,int mainzero_row)

{

cudaThreadSynchronize();

unsigned int timer = 0;

CUT_SAFE_CALL(cutCreateTimer(&timer));

CUT_SAFE_CALL(cutStartTimer(timer));

//copy data from CPU to GPU

unsigned char* zero_gpu;

int Main_size = sizeof(unsigned char)mainzero_colmainzero_row;

cudaMalloc((void**)&zero_gpu,Main_size);

cudaMemcpy(zero_gpu, zero_cpu, Main_size, cudaMemcpyHostToDevice);//now the matrix had been stored into Global memory

cudaThreadSynchronize();

// stop and destroy timer

CUT_SAFE_CALL(cutStopTimer(timer));

printf(“copy CPU - GPU time: %f (ms) \n”, cutGetTimerValue(timer));

CUT_SAFE_CALL(cutDeleteTimer(timer));

}

the resolution of CPU_timer is very hight

CPU_timer 0.321340 second

GPU_timer 0.009102 second

i think that the time for calling function not too much like this CPU_timer-GU_timer

ryzhiy · April 22, 2008, 10:18am

Memory allocation may takes more time then copying.

Try to measure time of copying only

Quoc_Vinh · April 23, 2008, 1:29am

before i use this function (!)

cudaThreadSynchronize();

the different of GPU_timer and CPU_timer is very small

after i use this function (!)

cudaThreadSynchronize();

the different of GPU_timer and CPU_timer is very large

so i think they has some happen on it :(

the client is only know the CPU_timer and don’t care GPU_timer :(

but before or after use the cudaThreadSynchronize(); function

the CPU_timer wasn’t change(same as a constant) :D

the GPU_timer was change(big change) :(

another problem :">

why GPU_timer for copy data from CPU to GPU is not same the GPU_timer for copy data from GPU to CPU (the different is very large) <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

I use two timer for measure data copy from CPU to GPU and GPU to CPU :)

CPU to GPU :(

I copy 4.76MB it take 0.2105404 milisecond

GPU to CPU :D

I copy 4.76MB it take 0.1980835 milisecond

External Media External Media External Media External Media External Media

please give me the answer

DenisR · April 23, 2008, 4:51am

you should be averaging the time of 100s of these because the times will fluctuate.

CPU->GPU copy has a different bandwith as vice-versa usually (look at the output of bandwithtest)

The reason CPU time is higher as GPU time is because the CPU has to move the memory to a pinned memory buffer, then call the GPU to do the DMA transfer, and wait for the GPU to finish. So the CPU has some extra stuff to do besides waiting for the GPU to finish the DMA transfer.

Quoc_Vinh · April 23, 2008, 7:06am

thank you DenisR

my code dose not has any mistake (syntax with algorithm)

I try to find what wrong in my code

i had to used cuda profiler and the result is same as i GPU_timer

So I think that your idea is correct.

But it not which i expect to. :(

Topic		Replies	Views
Memory Transfer CUDA Programming and Performance	7	2959	October 10, 2008
Very slow memory transfer problem Simple program executes very slowly, bandwidth test shows normal r CUDA Programming and Performance	2	907	February 7, 2011
A few questions on CUDA performance with pictures! CUDA Programming and Performance	6	3349	January 10, 2009
[solved] strange cuda memcopy time CUDA Programming and Performance	5	703	March 26, 2015
copy memory slow? CUDA Programming and Performance	2	4798	February 12, 2009
Inconsistant Memory Copy Speed CUDA Programming and Performance	14	8062	May 20, 2009
Why is there the difference of memory copy speed between cpu>gpu and gpu>cpu CUDA Programming and Performance	3	1273	April 10, 2014
Memory copy very slow memory copy, image CUDA Programming and Performance	10	12480	April 7, 2011
Memory copy to GPU 1 is slower in multi-GPU CUDA Programming and Performance	2	4391	April 5, 2010
Slow Memory Copies CUDA Programming and Performance	7	1123	November 6, 2018

DATA tranfer from CPU to GPU

Related topics