Complex data structure yields "Advisory" warnings Pre-determined structures in memory

James_Malcolm · April 23, 2008, 5:48am

I need each thread in my kernel to traverse a shared, complex data structure known at compile time. Two questions:

What is the best memory or method for this?
If the method I post below is okay, what can I do about the Advisory messages?

I’ve looked through several related posts. People suggest textures for lookup tables, but here the data is heterogeneous and not easily laid out on a texture. People have suggested allocating space on the card and copying host-to-device, but that seems tedious for complicated data structures, and I would like to compile to a cubin. Basically, I would rather write the data structure directly into the kernel.

Currently, I just auto-initialize the data structure right there in the function body. However, the pointers-to-pointers raises an advisory and assumes global memory–but since the data is local (or somewhere else), invalid memory is used. Here is a related example with output (left unsafe to simplify presentation). On a 2x2 block, it references into a pre-defined data structure where each kernel accesses the element at its specific row and column.

#include <stdio.h>

__global__ void copy_gmemA(int* g_odata)

{

    /* reference data structure */

    int foo_A[] = {1, 2}; /* first row */

    int foo_B[] = {3, 4}; /* second row */

    int *const_vals[] = {foo_A, foo_B};

   int idx = threadIdx.x * blockDim.y + threadIdx.y;

    g_odata[idx] = const_vals[threadIdx.x][threadIdx.y];

}

int main(void)

{

    /* allocate space for kernel output on device and host */

    int len = 4;

    int *d_odata, *h_odata;

    cudaMalloc((void**)&d_odata, sizeof(int)*len);

    h_odata = (int *)malloc(sizeof(int) * len);

   /* launch kernel */

    dim3  grid(1, 1, 1);

    dim3  threads(2, 2, 1);

    copy_gmemA<<<grid, threads>>>(d_odata);

   /* pull memory back onto host and display */

    cudaMemcpy(h_odata, d_odata, sizeof(int)*len, cudaMemcpyDeviceToHost);

    for (int i = 0; i < len; i++)

        printf("%d ", h_odata[i]);

    printf("\n");

   return 0;

}

Which outputs

bash$ nvcc test.cu && ./a.out

"/tmp/tmpxft_0000b623_00000000-5_test.i", line 11: Advisory: Cannot tell what pointer points to, assuming global memory space

0 0 0 0

James_Malcolm · April 23, 2008, 1:07pm

Here’s another way of trying to lay it out in in the kernel, but this yields both the Advisory and an assembler error when it’s trying to emit instructions (code and output pasted below). I’m assuming that when it tries to emit ‘foo’, it doesn’t know the address of ‘foo_A’ or ‘foo_B’.

I’ve found several other related posts, but nothing involving a small pre-defined data structure linked up via pointers. I would rather avoid using cudaMemcpyToSymbol() because it is a complicated, tightly linked data structure I intend to use this on.

Transfering a list of pointers to the device

Copying structures with pointers

Uniform lookup tables

Using cudaMemcpyToSymbol()

#include <stdio.h>

/* reference data structure */

__constant__ int foo_A[] = {1, 2};

__constant__ int foo_B[] = {3, 4};

__constant__ int *foo[] = {foo_A, foo_B};

__global__ void foo_kernel(int* g_odata)

{

    int idx = threadIdx.x * blockDim.y + threadIdx.y;

    g_odata[idx] = foo[threadIdx.x][threadIdx.y];

}

int main(void)

{

    /* allocate space for kernel output on device and host */

    int len = 4;

    int *d_odata, *h_odata;

    cudaMalloc((void**)&d_odata, sizeof(int)*len);

    h_odata = (int *)malloc(sizeof(int) * len);

   /* launch kernel */

    dim3  grid(1, 1, 1);

    dim3  threads(2, 2, 1);

    foo_kernel<<<grid, threads>>>(d_odata);

   /* pull memory back onto host and display */

    cudaMemcpy(h_odata, d_odata, sizeof(int)*len, cudaMemcpyDeviceToHost);

    for (int i = 0; i < len; i++)

        printf("%d ", h_odata[i]);

    printf("\n");

   return 0;

}

Output:

bash$ nvcc test.cu

"/tmp/tmpxft_0000bca5_00000000-5_test.i", line 11: Advisory: Cannot tell what pointer points to, assuming global memory space

### Assertion failure at line 906 of ../../be/cg/NVISA/cgemit_targ.cxx:

### Compiler Error in file /tmp/tmpxft_0000bca5_00000000-5_test.i during Assembly phase:

### NYI initv kind 1

nvopencc INTERNAL ERROR: /usr/local/cuda/bin/../open64/lib//be returned non-zero status 1

James_Malcolm · April 26, 2008, 1:45am

For algorithmic simplicity, I would much rather traverse pointers through this data structure, but I can hack up the structure so that everything is laid out serially. I would use linear indices instead of pointers to stitch it all together.

There are only about three sub-structures in the data structure I intend to use, and instances of these structures could be packed end-to-end. I’m guessing this would lead to better alignment and hence faster fetching.

Question: is the best approach to lay it all out like this in constant memory?

Following the examples above, it might look something like this where ‘foo_data’ contains both rows laid one after the other, and ‘foo_rows’ contains the linear offset for each row. This compiles without error and runs correctly.

#include <stdio.h>

/* reference data structure */

__constant__ int foo_data[] = {1, 2, 3, 4};

__constant__ int foo_rows[] = {0, 2}; /* indices into foo_data where each row starts */

__global__ void copy_gmemA(int* g_odata)

{

    int idx = threadIdx.x * blockDim.y + threadIdx.y;

    g_odata[idx] = foo_data[foo_rows[threadIdx.x] + threadIdx.y];

}

int main(void)

{

    /* allocate space for kernel output on device and host */

    int len = 4;

    int *d_odata, *h_odata;

    cudaMalloc((void**)&d_odata, sizeof(int)*len);

    h_odata = (int *)malloc(sizeof(int) * len);

   /* launch kernel */

    dim3  grid(1, 1, 1);

    dim3  threads(2, 2, 1);

    copy_gmemA<<<grid, threads>>>(d_odata);

   /* pull memory back onto host and display */

    cudaMemcpy(h_odata, d_odata, sizeof(int)*len, cudaMemcpyDeviceToHost);

    for (int i = 0; i < len; i++)

        printf("%d ", h_odata[i]);

    printf("\n");

   return 0;

}

mandana · October 14, 2008, 12:22pm

For algorithmic simplicity, I would much rather traverse pointers through this data structure, but I can hack up the structure so that everything is laid out serially. I would use linear indices instead of pointers to stitch it all together.

There are only about three sub-structures in the data structure I intend to use, and instances of these structures could be packed end-to-end. I’m guessing this would lead to better alignment and hence faster fetching.

Question: is the best approach to lay it all out like this in constant memory?

Following the examples above, it might look something like this where ‘foo_data’ contains both rows laid one after the other, and ‘foo_rows’ contains the linear offset for each row. This compiles without error and runs correctly.
#include <stdio.h>

/* reference data structure */

__constant__ int foo_data[] = {1, 2, 3, 4};

__constant__ int foo_rows[] = {0, 2}; /* indices into foo_data where each row starts */

__global__ void copy_gmemA(int* g_odata)

{

 Â  Â int idx = threadIdx.x * blockDim.y + threadIdx.y;

 Â  Â g_odata[idx] = foo_data[foo_rows[threadIdx.x] + threadIdx.y];

}

int main(void)

{

 Â  Â /* allocate space for kernel output on device and host */

 Â  Â int len = 4;

 Â  Â int *d_odata, *h_odata;

 Â  Â cudaMalloc((void**)&d_odata, sizeof(int)*len);

 Â  Â h_odata = (int *)malloc(sizeof(int) * len);

Â  Â /* launch kernel */

 Â  Â dim3 Â grid(1, 1, 1);

 Â  Â dim3 Â threads(2, 2, 1);

 Â  Â copy_gmemA<<<grid, threads>>>(d_odata);

Â  Â /* pull memory back onto host and display */

 Â  Â cudaMemcpy(h_odata, d_odata, sizeof(int)*len, cudaMemcpyDeviceToHost);

 Â  Â for (int i = 0; i < len; i++)

 Â  Â  Â  Â printf("%d ", h_odata[i]);

 Â  Â printf("\n");

Â  Â return 0;

}
[snapback]368585[/snapback]

I wanted to copy a structure containing pointer from host to device’s global memory. and then from global memory to shared memory. when copying from global memory to shared memory i get the advisory warning.

my structure is the following:

struct align(8) {

float* elements;

}Matrix;

and also i cant run the program.

alex_dubinsky · October 14, 2008, 4:50pm

I don’t think you’re doing anything wrong, the compiler is just easily confused. I hear 2.1 will be much better at figuring convoluted pointers out. (The compiler has to do something the C language doesn’t support: keeping track of the nature of the pointer and whether it should access local, global, shared, or constant mem when dereferenced. I guess just using different address ranges for each type was too simple.)

If you play around with the syntax (not necessarily resort to indices) you might get it to work.

Or you can just do:
g_odata[idx] = const_vals[threadIdx.y*stride + threadIdx.x];
which is the most common way to access arrays.

Topic		Replies	Views
How to handle Advisory . CUDA Programming and Performance	9	3591	March 26, 2009
Why nvcc says Advisory ? Advisory: Cannot tell what pointer ... CUDA Programming and Performance	5	4340	January 24, 2008
Qualifying Pointers referring to const/shared data CUDA Programming and Performance	6	3868	December 17, 2007
Transfering struct with pointers to device memory Used for variable argument list CUDA Programming and Performance	11	27136	January 19, 2011
Pass arguments through constant memory CUDA Programming and Performance	20	8718	August 11, 2010
Constant memory pointer to constant memory structure in constant memory with pointer to more constan CUDA Programming and Performance	4	2625	February 10, 2009
Dealing with Structures CUDA Programming and Performance	1	957	November 11, 2010
Structs containing pointers Warning: Cannot tell what pointer points to, assuming global memory sp CUDA Programming and Performance	3	3896	March 2, 2010
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12282	July 1, 2009
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3372	August 27, 2010

Complex data structure yields "Advisory" warnings Pre-determined structures in memory

Related topics