Hello,
I need to use variables and arrays during all program in different device functions, so i decided to initial it with device.
But I got some error when i tried to use device array in global function, for example
float* host_a;
device float* dev_a;
global change ()
{
dev[threadIdx.x] *= -1;
}
main()
{
cudaError_t cuerr;
host_a = (int*) malloc(sizeof(int)*10);
for(int i =0; i<size; i++)
host_a [i] = 0;
cuerr= cudaMalloc(&dev_a,10*sizeof(float));
cudaMemcpy(dev_a,host_a,(10*sizeof(float)),cudaMemcpyHostToDevice);
change<<<1,host_size>>();
cuerr=cudaMemcpy(host_a,dev_a,(10*sizeof(float)),cudaMemcpyDeviceToHost);
if(cuerr!=cudaSuccess)
printf(,cudaGetErrorString(cuerr));
}
after this i got “unknowm error”
As for You code sample ‘dev_a’ is a device pointer to float. You cannot access it form host code.
Tried passing ‘dev_a’ as kernel call parameter instead? Define it locally (in main() function for example), allocate it with cudaMalloc and then pass as kernel parameter, like so:
__global__ void change(float *ptr) {
...
}
int main() {
...
float *dev_a;
cuerr= cudaMalloc(&dev_a,10*sizeof(float));
...
change<<<1,host_size>>>(dev_a);
...
}
Cheers,
MK
I think that if you define it outside of main() you need to run a command find out the address of the pointer.
Thanks, cmaster.matso!
But I have more than 20 arrays and variables. Using it like kernel call parameter is not so comfortable).
And what about constant memory? How can i define and use arrays constant memory? I mean something like that:
float* host_a;
constant float* const_a;
global change (float *array)
{
array[threadIdx.x] = const_a[threadIdx.x];
}
main()
{
cudaError_t cuerr;
float* dev_a;
host_a = (int*) malloc(sizeof(int)*10);
for(int i =0; i<size; i++)
host_a [i] = 0;
cuerr= cudaMalloc(&dev_a,10sizeof(float));
cudaMemcpy(dev_a,host_a,(10sizeof(float)),cudaMemcpyHostToDevice);
/* CUDA MAGIC
copy data from host_a to const_a
/
change<<<1,host_size>>(dev_a);
сuerr=cudaMemcpy(host_a,dev_a,(10sizeof(float)),cudaMemcpyDeviceToHost);
}
Say You have a variable ‘array’ allocated by ‘cudaMalloc’, in the host code. That means You have a pointer to a device memory block. What You need to do is to find a symbol (can be constant) in Your device code, using ‘cudaGetSymbolAddress’ (say ‘dev_a’). Next copy the address of the allocated memory block (‘array’) to the address of the symbol You got with ‘cudaGetSymbolAddress’ call, i.e. copy it into the memory place where the ‘dev_a’ pointer is. In serial C code it would be like so:
...
float *array = malloc(N*sizeof(float));
float *dev_a;
dev_a = array; // Don't copy the content to the array but set both pointers point the same memory block
...
To sum-up:
[*] 'array' points to some device memory block,
[*] 'dev_a' is a device-side pointer, not initialized by default (pointing somewhere unknown),
[*] one need to set up 'dev_a' to point to memory block pointed by 'array', thus copy of the address itself.
Cheers,
MK
Thanks!
Can you give me example of cuda code? Because all of it is not easy for me) I tried to use cudaGetSymbolAddress, and got error “invalid device symbol”.
I use driver API functions, ‘cuModuleGetGlobal’ (to get address of a named, device-side variable), ‘cuMemGetAddressRange’ (to get address of allocated buffer) and ‘cuMemcpyHtoD’ (to finally copy the address).
I’m having the same issue: too many global device arrays to pass them as parameters to a kernel. Could not find any example how to do it. Help!
-
Flatten all your arrays and concatenate them. Then pass a pointer to the single array and a pointer to the array that gives the offset to the start of each sub-array, in the concateneated array. Maybe also a pointer to an array of lengths of each sub-array.
-
Pass a pointer to an array of pointers, each of which is a pointer to a sub-array. You probably also need to pass an integral value that is the length of the array of pointers. Maybe also an array of the lengths of each sub-array.
cublasSgetri/cublasSgetrf uses the second method, for example, for the array of pointers to device matrices to be inverted.