Question Regarding an Array of Pointers on the Device

lemonherb · January 21, 2009, 7:03am

Hi. I am trying to send a large data structure on the host to the device which is in the form of an array of pointers (to arrays of floating point data). I am trying to get each thread block to process one array.

However, when I compile, I get the following message (which I think is relevant to this problem):

/tmp/tmpxft_00006618_00000000-7_SpMVBlock.cpp3.i(176): Advisory: Cannot tell what pointer points to, assuming global memory space

When I have each thread load a particular data from the global memory and write it back into another structure, send it back to the host, and then check the numbers, I get seemingly random numbers ranging from 0 to -137241480838288389375466143744.0000000000 and a lot of “nan”.

I was wondering whether what I am trying to do is even possible, and if so, what I might be doing wrong. I would greatly appreciate it if anybody knows what I should do.

I’ve ran the code in emulation mode and it seems to work fine (almost). I have a slight problem because the emulator seems to execute each thread to the end before it executes the next thread and my code requires each thread to sync at certain points for coalesced memory loads to the shared memory.

I’ve included my relevant code below.

Thanks.

p.s. also, I’m getting a lot of incoherent loads in the emulation mode. Anybody have any idea why that might be?

============

Host:

    // Allocate memory to store pointers to arrays.
    cudaMalloc( (void**) &Ad_values, (numRows*sizeof(DTYPE*)) );
    cudaMalloc( (void**) &Ad_colidx, (numRows*sizeof(int*)) );

    // Allocate memory for each array.
    for(i=0;i<numRows;i++) {
            // compute how much data to allocate
            tmp1 = 0;
            tmp2 = (i*maxNumBlocks);
            for(j=0;j<A_num_blk_per_row[i];j++) {
                    tmp1 = tmp1 + A_max_rowsize_per_row[tmp2];
                    tmp2++;
            }
            // allocate memory on the device.
            cudaMalloc( (void**) &(Ad_values[i]), (tmp1*BLOCK_SIZE_Y*sizeof(DTYPE)) );
            cudaMalloc( (void**) &(Ad_colidx[i]), (tmp1*BLOCK_SIZE_Y*sizeof(int)) );
    }

    // transfer the data from the host to the device.
    for(i=0;i<numRows;i++) {
            // compute how much data to send.
            tmp1 = 0;
            tmp2 = (i*maxNumBlocks);
            for(j=0;j<A_num_blk_per_row[i];j++) {
                    tmp1 = tmp1 + A_max_rowsize_per_row[tmp2];
                    tmp2++;
            }
            cudaMemcpy( Ad_values[i], A_values_reformat[i], (tmp1*BLOCK_SIZE_Y*sizeof(DTYPE)), cudaMemcpyHostToDevice);
            cudaMemcpy( Ad_colidx[i], A_colidx_reformat[i], (tmp1*BLOCK_SIZE_Y*sizeof(int)), cudaMemcpyHostToDevice);
    }

SpMVBlock_kernel<<<dimGrid, dimBlock>>>(Ad_values, Ad_colidx, Xd, Yd, m);

Kernel:
float* baseAddr1;
int* baseAddr2;

   bid = blockIdx.x;
   tid = threadIdx.x;

   baseAddr1 = Ad_values[bid];
   baseAddr2 = Ad_colidx[bid];

  Yd[(bid*BLOCK_SIZE_Y)+tid] = baseAddr1[tid];

============

gshi · January 22, 2009, 8:41pm

You cannot dereference cuda memory pointer in host. And that’s exactly what you do when you try to use the array of pointers, which are allocated in GPU global memory, to store memory pointers.

I believe you can allocate the memory of the array of pointers in host and assign all GPU memory pointers to this host array. Then you can do a cudaMemcpy() to copy this array of pointers to GPU global memory.

Topic		Replies	Views
List of pointers to device CUDA Programming and Performance	7	5568	May 8, 2008
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12170	July 1, 2009
Problem in getting __device__ dynamic array data back to host CUDA Programming and Performance	13	2135	January 18, 2013
Global memory double pointer problem CUDA Programming and Performance	4	1621	June 5, 2009
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3478	January 17, 2024
Global arrays? CUDA Programming and Performance	24	10613	August 18, 2010
Allocate non-constant number of arrays on device CUDA Programming and Performance	2	1579	April 20, 2009
passing an array to a kenel ? CUDA Programming and Performance	9	13083	June 10, 2009
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1585	July 15, 2015
allocating double pointer memory in GPU CUDA Programming and Performance	3	11679	February 3, 2011

Question Regarding an Array of Pointers on the Device

Related topics