how to get the value of array in a kernel function

hakuna · October 30, 2007, 5:39am

I want to get the values of an array in a kernel function using a global pointer from the host. So I first defined an array pointer in the kernel file:

device unsigned int *result;

then, I initialize it in the main function:

CUDA_SAFE_CALL( cudaMalloc((void **)&result, NUM) );

also, I defined a host array in main function like this:

unsigned int *h_result;

h_result = (unsigned int *)malloc(NUM);

I operate the device array in the kernel function, for example:

result[threadIdx.x] = threadIdx.x + 100;//just a operation illustrate

then, I get the array of result using cudaMemcpy:

CUDA_SAFE_CALL( cudaMemcpy(h_result, result, NUM, cudaMemcpyDeviceToHost) );

but all the program always get an error.

so can any one can tell me where I am wrong?

jordyvaneijk · October 30, 2007, 7:42am

Maybe you can tell what kind of error you get? And what you already tried yourself. So we’re not telling you something you already tried. That should be very helpful

hakuna · October 30, 2007, 8:05am

yes, thank you for your awoke.

I tried to see the value of the array, but in the mode of EmuDebug, when the program run to the operation of the global array, the error accurs, and it says: memory access conflict.

in the debug widow of VS2005, I found that the X_array[threadIdx.x] can not be compute rightly.

then, I tried to change the type of global array from device unsigned int *X_array to unsigned int *X_array, but the compiler report that in the kernel function, the X_array not defined.

sphyraena · October 30, 2007, 12:04pm

What is NUM? If it is the number of integers in result, then you need to multiply that by sizeof(unsigned int).

hakuna · October 30, 2007, 12:59pm

yes, you are right. I forgot it. but in my program the sizeof() is multiplied. now, I have tried several ways on it, but no one be successiful.

(1)define a array pointer device unsigned int *array in the kernel file, and then initialize it in the main function, then operate the array in the kernel function, but when running, the program go exception.

(2)define a array device unsigned int array[NUM] in the kernel file, operate it in the kernel function, and then derectly read it in the main function(device variable can be accessed from the host) like this: unsigned int x = array[i];

in the emudebug mode, I found that the array in the kernel function works fine, but in the main function, the values got all are 0.

hakuna · October 30, 2007, 1:26pm

(3)define device unsigned int array[NUM] in the main file (out of main function), then I operate the array in the kernel function, it works fine. when the kernel finishes running, I copy the memory from device to host:
cudaMemcpy(host_pointer, array, NUM*sizeof(unsigned int), cudaMemcpyDeviceToHost); but when running, error occurs here, it says that invalid device pointer.

MisterAnderson42 · October 30, 2007, 1:45pm

unsigned int *d_result;  //< NOTE: NOT __device__

CUDA_SAFE_CALL( cudaMalloc((void **)&d_result, NUM) );

h_result = (unsigned int *)malloc(NUM);

// do kernel calculation into d_result (passing the d_result pointer as a parameter

CUDA_SAFE_CALL( cudaMemcpy(h_result, d_result, NUM*sizeof(unsigned int), cudaMemcpyDeviceToHost) );

device variables seem completely useless to me. I have never used one. It is quite easy just to use cudaMalloc and pass pointers around. Perhaps cudaMemcpyToSymbol is capable of reading an array declared “device unsigned int[NUM]” but I’m not sure if it would work.

hakuna · October 30, 2007, 1:54pm

unsigned int *d_result;  //< NOTE: NOT __device__

CUDA_SAFE_CALL( cudaMalloc((void **)&d_result, NUM) );

h_result = (unsigned int *)malloc(NUM);

// do kernel calculation into d_result (passing the d_result pointer as a parameter

CUDA_SAFE_CALL( cudaMemcpy(h_result, d_result, NUM*sizeof(unsigned int), cudaMemcpyDeviceToHost) );
device variables seem completely useless to me. I have never used one. It is quite easy just to use cudaMalloc and pass pointers around. Perhaps cudaMemcpyToSymbol is capable of reading an array declared “device unsigned int[NUM]” but I’m not sure if it would work.

[snapback]272307[/snapback]

yes, if use the array as a parameter, it’s easy to finish that. but is there any other ways to get the value of array from the host? just like we use a global variable array in common programming method to fetch some values from a sub function.

by the way, in the method (1),(2),(3), I used cudaGetSymbolAddress((void **)h_result, (unsigned int )&array)) to get the address of array, but it also get error, which says that invalid device pointer.

sphyraena · October 30, 2007, 2:24pm

h_result is a host pointer, not a device pointer. Hence the error.

It sounds like you might be getting confused over host and device memory and what you can do with each type of memory. It may be helpful to review the CUDA programming guide again.

MisterAnderson42 · October 30, 2007, 2:34pm

I’m more of a c++ programmer where global variables are evil. I find the design pattern of passing pointers around nice because then I can run the same kernel on many different datasets in a fully object oriented pattern. Hence I have never found any use for device variables, and can’t really help you further. Additionally, I will say that there can be severe performance issues if you just store the pointer “device unsigned int * result”. Reads will NOT be coalesced to that pointer and kernels will run slow. constant would be a better choice for storing pointers, while device is fine if you are actually declaring the array “device unsigned int result[NUM]”.

Anyways, I digress. The solution to your problem is in the proper use of cudaMemcpyToSymbol and related functions. I’ve only used them once or twice, so I’m not the best person to explain their use. http://forums.nvidia.com/index.php?showtopic=49711 talks about them some.

hakuna · October 30, 2007, 2:39pm

yes, but the h_result is just a pointer, and it’s not initialized by the malloc. even if you are right, why the cudaMemcpy() also get error that invalid device pointer, and why the direct access to the device type array got a always value 0? I reviewed the programming guide, but can not find any useful information.

hakuna · October 30, 2007, 2:59pm

I’m more of a c++ programmer where global variables are evil. I find the design pattern of passing pointers around nice because then I can run the same kernel on many different datasets in a fully object oriented pattern. Hence I have never found any use for device variables, and can’t really help you further. Additionally, I will say that there can be severe performance issues if you just store the pointer “device unsigned int * result”. Reads will NOT be coalesced to that pointer and kernels will run slow. constant would be a better choice for storing pointers, while device is fine if you are actually declaring the array “device unsigned int result[NUM]”.

Anyways, I digress. The solution to your problem is in the proper use of cudaMemcpyToSymbol and related functions. I’ve only used them once or twice, so I’m not the best person to explain their use. http://forums.nvidia.com/index.php?showtopic=49711 talks about them some.

[snapback]272326[/snapback]

thank you for your advice, what I　want to do　using the array to transfer values to host is that to add debug print information when I want to see the value variable in the kernel function. if pass an array pointer as a parameter, it would not be very convenient.

hakuna · October 31, 2007, 1:59am

Thanks for all the replys, I have resolved this problem, using the function cudaGetSymbolAddress() and cudaMemcpyFromSymbol(), all works fine.

(1) announce your array device TYPE X_ARRAY[NUM] in the main file (where main function lays).
(2) announce a host array pointer H_ARRAY and malloc NUMsizeof(TYPE) memory for it.
(3) operate the X_ARRAY in your kernel function.
(4) after the running of kernel, you can cudaGetSymbolAddress((void **)&ptr,X_ARRAY) to get the address of the array; if you want to print the value of the X_ARRAY, use the function cudaMemcpyFromSymbol((void )H_ARRAY, X_ARRAY, NUMsizeof(TYPE), 0, cudaMemcpyDeviceToHost);
(5) print the value of array using the pointer H_ARRAY;
now that we can not see the change of value in the kernel in the model of debug or release, here we can make use of this method to store the value in the kernel you want to see.

Topic		Replies	Views
__device__ variables and arrays CUDA Programming and Performance	8	15502	August 16, 2014
pointer in global device memory CUDA Programming and Performance	9	11734	November 23, 2011
Another Device Memory Question CUDA Programming and Performance	7	2386	February 9, 2010
device memory declared Globally not passed in CUDA Programming and Performance	1	1321	March 31, 2011
How can Iget the pointer to the device memory var CUDA Programming and Performance	9	4881	October 31, 2007
Question Regarding an Array of Pointers on the Device CUDA Programming and Performance	1	8535	January 22, 2009
passing an array to a kenel ? CUDA Programming and Performance	9	13615	June 10, 2009
global variables CUDA Programming and Performance	5	5441	December 12, 2007
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12269	July 1, 2009
Device to Host Symbol Memcpy CUDA Programming and Performance	3	4542	February 22, 2010

how to get the value of array in a kernel function

Related topics