Use of cuMemcpyDtoD in a __global__proc

I have a list of integers to copy in an other list.
A solution is

global proc()…

tid = blockDim.x * blockIdx.x + threadIdx.x;

for(jj=0;jj<mt_nn;jj++)// {


I would like to use cuMemcpyDtoD with something like:


I get then the erro by VC2055:

1> identifier “cuMemcpyDtoD” is undefined
1> cuMemcpyDtoD(local_tableau_mots,&d_tableau_mots[index_premier_k+mt_nntid],mt_nnsizeof(int));

What is wrong?
Thanks. :) :)

global functions run on the device, and you can only call cuMemcpyDtoD (or cudaMemcpy if you are using the Runtime API instead of the Driver API) from the host.

Thanks for your answer.
Does it mean there is only the first solution ?

Oh wait, I just reread your original post. Please ignore my previous response. I thought you were calling cuMemcpyDtoD from inside your kernel, but that’s not what you said. My apologies. :)

You #include “cuda.h” in your source file, right? cuMemcpyDtoD is defined there.

It is better,cuMemcpyDtoD is available with the #include you proposed,…but I have still the problem of type…(see message below)
My tables are unsigned int, and I don’ t know if and how I can
use the type CUdeviceptr to build the table correctly.

Thanks for your help.

“c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\sabr2\”, line 83: error:
1> argument of type "unsigned int " is incompatible with parameter of
1> type “CUdeviceptr”
1> cuMemcpyDtoD(local_tableau_mots,&d_tableau_mots[index_premier_k+624
tid],624*sizeof(unsigned int));

Is the solution to create the table with cuMemAlloc which returns the pointer?
But I am still troubled by the fact there will be no definition of the type of elements in the table, and the way to get one element.

Ok, from the sound of it, you actually want to be using the Runtime API. See appendix D in the programmer’s guide. Specifically, you should be using cudaMemcpy() in appendix D.5.10.

if I use cudaMemcpy in this global function, I have the error:

c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\sabr2\", line 88: error:
1> calling a host function from a device/global function is
1> only allowed in device emulation mode

I tried with the include of
#include “cuda.h”
#include “cuda_runtime_api.h”
#include “cuda_runtime.h”

but nothing better.

So, it seems not possible to use RunTime API in a global fucntion.

Only a solution with Driver API seems to be possible, with cuMemcpyDtoD or example.
Am I right, or what did I forgot?
Thanks in advance.

Ok, now we come back to my original point: You can’t call cudaMemcpy (or cuMemcpyDtoD) from a global function. Only from functions running on the CPU.

Are any functions that will do CUDA memcpy and memset operations inside global or device functions.