Use of cuMemcpyDtoD in a globalproc

x248 · March 19, 2008, 10:43pm

I have a list of integers to copy in an other list.
A solution is

global proc()…
{

tid = blockDim.x * blockIdx.x + threadIdx.x;

for(jj=0;jj<mt_nn;jj++)// {
local_tableau_mots[jj]=d_tableau_mot[index_premier_k+mt_nn*tid+jj];
}

}

I would like to use cuMemcpyDtoD with something like:

cuMemcpyDtoD(local_tableau_mots,&d_tableau_mots[index_premier_k+mt_nntid],mt_nnsizeof(int)));

I get then the erro by VC2055:

error:
1> identifier “cuMemcpyDtoD” is undefined
1> cuMemcpyDtoD(local_tableau_mots,&d_tableau_mots[index_premier_k+mt_nntid],mt_nnsizeof(int));

What is wrong?
Thanks. :) :)

seibert · March 19, 2008, 11:42pm

global functions run on the device, and you can only call cuMemcpyDtoD (or cudaMemcpy if you are using the Runtime API instead of the Driver API) from the host.

x248 · March 20, 2008, 9:28am

Thanks for your answer.
Does it mean there is only the first solution ?

seibert · March 20, 2008, 11:43am

Oh wait, I just reread your original post. Please ignore my previous response. I thought you were calling cuMemcpyDtoD from inside your kernel, but that’s not what you said. My apologies. :)

You #include “cuda.h” in your source file, right? cuMemcpyDtoD is defined there.

x248 · March 20, 2008, 9:59pm

It is better,cuMemcpyDtoD is available with the #include you proposed,…but I have still the problem of type…(see message below)
My tables are unsigned int, and I don’ t know if and how I can
use the type CUdeviceptr to build the table correctly.

Thanks for your help.
Regards.

“c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\sabr2\sabrKernelMT.cu”, line 83: error:
1> argument of type "unsigned int " is incompatible with parameter of
1> type “CUdeviceptr”
1> cuMemcpyDtoD(local_tableau_mots,&d_tableau_mots[index_premier_k+624tid],624*sizeof(unsigned int));

x248 · March 20, 2008, 10:17pm

Is the solution to create the table with cuMemAlloc which returns the pointer?
But I am still troubled by the fact there will be no definition of the type of elements in the table, and the way to get one element.
Thanks.

seibert · March 20, 2008, 10:24pm

Ok, from the sound of it, you actually want to be using the Runtime API. See appendix D in the programmer’s guide. Specifically, you should be using cudaMemcpy() in appendix D.5.10.

x248 · March 23, 2008, 8:33pm

if I use cudaMemcpy in this global function, I have the error:

c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\sabr2\sabrKernelMT.cu", line 88: error:
1> calling a host function from a device/global function is
1> only allowed in device emulation mode

I tried with the include of
#include “cuda.h”
#include “cuda_runtime_api.h”
#include “cuda_runtime.h”

but nothing better.

So, it seems not possible to use RunTime API in a global fucntion.

Only a solution with Driver API seems to be possible, with cuMemcpyDtoD or example.
Am I right, or what did I forgot?
Thanks in advance.

seibert · March 23, 2008, 8:44pm

Ok, now we come back to my original point: You can’t call cudaMemcpy (or cuMemcpyDtoD) from a global function. Only from functions running on the CPU.

KUNDAN_KUMAR · March 27, 2009, 6:00am

Are any functions that will do CUDA memcpy and memset operations inside global or device functions.

Thanks.

Use of cuMemcpyDtoD in a __global__proc

global proc()… {

for(jj=0;jj<mt_nn;jj++)// { local_tableau_mots[jj]=d_tableau_mot[index_premier_k+mt_nn*tid+jj]; }

Use of cuMemcpyDtoD in a globalproc

global proc()…
{

for(jj=0;jj<mt_nn;jj++)// {
local_tableau_mots[jj]=d_tableau_mot[index_premier_k+mt_nn*tid+jj];
}