how to get the value of array in a kernel function

I want to get the values of an array in a kernel function using a global pointer from the host. So I first defined an array pointer in the kernel file:

device unsigned int *result;

then, I initialize it in the main function:

CUDA_SAFE_CALL( cudaMalloc((void **)&result, NUM) );

also, I defined a host array in main function like this:

unsigned int *h_result;

h_result = (unsigned int *)malloc(NUM);

I operate the device array in the kernel function, for example:

result[threadIdx.x] = threadIdx.x + 100;//just a operation illustrate

then, I get the array of result using cudaMemcpy:

CUDA_SAFE_CALL( cudaMemcpy(h_result, result, NUM, cudaMemcpyDeviceToHost) );

but all the program always get an error.

so can any one can tell me where I am wrong?

Maybe you can tell what kind of error you get? And what you already tried yourself. So we’re not telling you something you already tried. That should be very helpful

yes, thank you for your awoke.

I tried to see the value of the array, but in the mode of EmuDebug, when the program run to the operation of the global array, the error accurs, and it says: memory access conflict.

in the debug widow of VS2005, I found that the X_array[threadIdx.x] can not be compute rightly.

then, I tried to change the type of global array from device unsigned int *X_array to unsigned int *X_array, but the compiler report that in the kernel function, the X_array not defined.

What is NUM? If it is the number of integers in result, then you need to multiply that by sizeof(unsigned int).

yes, you are right. I forgot it. but in my program the sizeof() is multiplied. now, I have tried several ways on it, but no one be successiful.

(1)define a array pointer device unsigned int *array in the kernel file, and then initialize it in the main function, then operate the array in the kernel function, but when running, the program go exception.

(2)define a array device unsigned int array[NUM] in the kernel file, operate it in the kernel function, and then derectly read it in the main function(device variable can be accessed from the host) like this: unsigned int x = array[i];

in the emudebug mode, I found that the array in the kernel function works fine, but in the main function, the values got all are 0.

(3)define device unsigned int array[NUM] in the main file (out of main function), then I operate the array in the kernel function, it works fine. when the kernel finishes running, I copy the memory from device to host:
cudaMemcpy(host_pointer, array, NUM*sizeof(unsigned int), cudaMemcpyDeviceToHost); but when running, error occurs here, it says that invalid device pointer.

unsigned int *d_result;  //< NOTE: NOT __device__

CUDA_SAFE_CALL( cudaMalloc((void **)&d_result, NUM) );

h_result = (unsigned int *)malloc(NUM);

// do kernel calculation into d_result (passing the d_result pointer as a parameter

CUDA_SAFE_CALL( cudaMemcpy(h_result, d_result, NUM*sizeof(unsigned int), cudaMemcpyDeviceToHost) );

device variables seem completely useless to me. I have never used one. It is quite easy just to use cudaMalloc and pass pointers around. Perhaps cudaMemcpyToSymbol is capable of reading an array declared “device unsigned int[NUM]” but I’m not sure if it would work.

yes, if use the array as a parameter, it’s easy to finish that. but is there any other ways to get the value of array from the host? just like we use a global variable array in common programming method to fetch some values from a sub function.

by the way, in the method (1),(2),(3), I used cudaGetSymbolAddress((void **)h_result, (unsigned int )&array)) to get the address of array, but it also get error, which says that invalid device pointer.

h_result is a host pointer, not a device pointer. Hence the error.

It sounds like you might be getting confused over host and device memory and what you can do with each type of memory. It may be helpful to review the CUDA programming guide again.

I’m more of a c++ programmer where global variables are evil. I find the design pattern of passing pointers around nice because then I can run the same kernel on many different datasets in a fully object oriented pattern. Hence I have never found any use for device variables, and can’t really help you further. Additionally, I will say that there can be severe performance issues if you just store the pointer “device unsigned int * result”. Reads will NOT be coalesced to that pointer and kernels will run slow. constant would be a better choice for storing pointers, while device is fine if you are actually declaring the array “device unsigned int result[NUM]”.

Anyways, I digress. The solution to your problem is in the proper use of cudaMemcpyToSymbol and related functions. I’ve only used them once or twice, so I’m not the best person to explain their use. talks about them some.

yes, but the h_result is just a pointer, and it’s not initialized by the malloc. even if you are right, why the cudaMemcpy() also get error that invalid device pointer, and why the direct access to the device type array got a always value 0? I reviewed the programming guide, but can not find any useful information.

thank you for your advice, what I want to do using the array to transfer values to host is that to add debug print information when I want to see the value variable in the kernel function. if pass an array pointer as a parameter, it would not be very convenient.

Thanks for all the replys, I have resolved this problem, using the function cudaGetSymbolAddress() and cudaMemcpyFromSymbol(), all works fine.

(1) announce your array device TYPE X_ARRAY[NUM] in the main file (where main function lays).
(2) announce a host array pointer H_ARRAY and malloc NUMsizeof(TYPE) memory for it.
(3) operate the X_ARRAY in your kernel function.
(4) after the running of kernel, you can cudaGetSymbolAddress((void **)&ptr,X_ARRAY) to get the address of the array; if you want to print the value of the X_ARRAY, use the function cudaMemcpyFromSymbol((void )H_ARRAY, X_ARRAY, NUMsizeof(TYPE), 0, cudaMemcpyDeviceToHost);
(5) print the value of array using the pointer H_ARRAY;
now that we can not see the change of value in the kernel in the model of debug or release, here we can make use of this method to store the value in the kernel you want to see.