why use cudamemcpy instead of a simple for loop

winterfire · March 13, 2009, 7:51am

A newbie question: why we use “cudamemcpy” to initialize a device variable instead of using a for loop. Both seem to work for me. For example,

cudaMemcpy(a_d,a_h, n*sizeof(int), cudaMemcpyHostToDevice)
versus.
for(int i=0;i<n;i++) a_d[i]=a_h[i].

Thanks…

tmurray · March 13, 2009, 7:58am

Unless you’re using device emulation, your method will not work at all–the GPU’s memory is not directly addressable by the CPU. It’ll probably segfault and crash.

winterfire · March 13, 2009, 8:54am

You are right. Thanks tmurray again!

I have another question regarding how to use shared memory. To simplify the case, say we have two arrays float A[10] and float B[10]. Each element value in B depends on values of three consecutive elements in A, e,g. B[1]=a0A[0]+a1A[1]+a2A[2], B[2]=a1A[1]+a2A[2]+a3A[3]… How do I allocate appropriate shared memory for array A?

Thanks!

jgoffeney · March 13, 2009, 10:59am

If you look at the separable convolution example in the SDK it addresses a similar problem. A solution is to add additional threads to your thread blocks. Each thread is responsible for reading a single value into the shared memory array. Then put in a conditional for the summing step to have the extra threads sit out for the rest of the kernel.

winterfire · March 16, 2009, 8:51pm

Thanks… jgoffeney. let me give it a try…

Topic		Replies	Views
Copying data into shared memory CUDA Programming and Performance	9	3869	July 1, 2009
shared memory and device function CUDA Programming and Performance	1	1855	April 21, 2011
using shared memory CUDA Programming and Performance	6	3021	September 17, 2009
From Global to Shared Copy some data from Global mem to Shared mem CUDA Programming and Performance	2	3410	November 25, 2011
Shared Memory question CUDA Programming and Performance	5	2964	November 25, 2016
Array and Shared memory Accessing element trough shared memory CUDA Programming and Performance	1	1969	August 13, 2009
Some confusion on using shared memory. CUDA Programming and Performance	26	9397	June 2, 2009
how to use shared memory CUDA Programming and Performance	6	7792	September 5, 2010
Shared Memory Is my program correct ? CUDA Programming and Performance	2	6882	March 23, 2009
efficient static arrays in kernel CUDA Programming and Performance	2	2379	March 31, 2009

why use cudamemcpy instead of a simple for loop

Related topics