cudaMemcpy question

I am trying to learn Cuda. I have the following code,

global void DivideByPivot(float *d_m, float pivot, int N)
{
int j = blockIdx.x * blockDim.x + threadIdx.x;
if (j < N) d_m[j] /= pivot;
}

I want to read back the value in d_m[7] from the device. Do I have to use cudaMemcpy to do that?
How can I then assign d_m[7] = 1.0 Do I have to do cudaMemcpy again?

Is there a simpler way to read and write a single value from and to the device memory?

Thanks…

Here is what you do:

Host Application does the CudaMemcpy (HosttoDevice)

then you can access the Data within your Kernel by using the index dm[index]

to get the data back to the host the host code does:

CudaMemcpy DevicetoHost

You only need to copy your data once, then use it in the kernel and then copy it back.

i suggest you look into some example codes.

Hi,

I don’t need to read back the whole array. I need to read back only a single value from the device like x = d_m[7].
I also want to write to a single position on the device like d_m[7] = x; Do I need to call cudaMemcpy for these?

mh ok maybe i need more informations.

you can also just readback a single item.

cudaMemcpy(a_device , a_Host , MEMsize, cudaMemcpyDeviceToHost);

this would copy the a_device Array from gpumemory to hostmemory. the size is defined by for example MEMSize = 10 * sizeof(float)
for an float array with 10 elements.

but i dont really get it what you want to do, and why!

You only need cudaMemcpy when transferring data from Device to Host or vice versa.

once your data is on the device you can acces and edit it. just by indexing!

//*** Host **********************************************
pivot = d_Gs[ind + icol];
d_Gs[ind + icol] = 1.0;
DivideByPivot <<< Blocks, BlockSize >>> ((d_Gs + ind), pivot, NP);
if (d_Gs[ind + icol] == 1)
DivideByPivot <<< Blocks, BlockSize >>> ((d_Gs + ind), (pivot * 0.5) , NP);

/********************************************************

global void DivideByPivot(float *d_m, float pivot, int N)
{
int j = blockIdx.x * blockDim.x + threadIdx.x;
if (j < N) d_m[j] /= pivot;
}

d_Gs is a device memory area. I want to read a single value to a variable called pivot. I also want to write a value to a single device location d_Gs[ind + icol] = 1.0;

You cant access any device memory from the host, without cudamemcpyDeevicetohost.

d_Gs[ind + icol] = 1.0; doesnt work. you must first copy d_Gs back to the host.