"Warning: Cannot tell what pointer points to, assuming global memory space"

Hello All,

So, another tread on this forum got me to try and allocate structures containing pointers to structures on the device.

After much trial and error and google searching, I think I have code that works and does what I want it to but I keep getting this error, “Warning: Cannot tell what pointer points to, assuming global memory space”.

It doesn’t break my code and when I was playing around with code last night, I would get the same error and my code would just segmentation fault. But not this time! I can actually assign values and have it copy back fine. But this warning bugs me. I want to get rid of it but I’m not sure how.

Also, how am I supposed to deal with de-allocating everything I’ve allocated? I’ve tried stuff in the past before and it just doesn’t seem to work out right for me lol.

Here’s my code below :

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>

struct node {

   int level;

   struct node *data;
};

__global__ void memoryCheck(struct node *root) {

   root->data->level = 1234;
   root->level = 5678;

   return;
}

int main(void) {

/* Allocate host memory */

   struct node *root = (struct node*) malloc(sizeof(*root));
   struct node *data = (struct node*) malloc(sizeof(*data));

/* Allocate device memory */

   struct node *dev_root;
   struct node *dev_data;

   cudaMalloc(&dev_root, sizeof(*root));
   cudaMalloc(&dev_data, sizeof(*root));

/* Copy memory size using segregated addresses */

   cudaMemcpy(dev_data, data, sizeof(struct node), cudaMemcpyHostToDevice);

   root->data = dev_data;

   cudaMemcpy(dev_root, root, sizeof(struct node), cudaMemcpyHostToDevice);

/* Launch kernel */

   memoryCheck<<<1, 1>>>(dev_root);

/* Copy the results back to the host */

   cudaMemcpy(root, dev_root, sizeof(struct node), cudaMemcpyDeviceToHost);

   cudaMemcpy(data, dev_data, sizeof(struct node), cudaMemcpyDeviceToHost);

   root->data = data;

/* Print results for visual confirmation */

   printf("%d\n", root->data->level);
   printf("%d\n", root->level);

   return 0;
}

You should only see this warning when building for compute capability 1.x. Devices with compute capability 1.x do not have a generic device memory space. Each data object is part of a specific memory space on the device. But in C/C++ a “pointer is a pointer is a pointer”, that is, there is no notion of separate memory spaces. When using a single level of indirection, the compiler used for 1.x compilation can usually track what specific memory space a pointer points to. If the code or the data structures become more complicated, it becomes impossible for the compiler to track this. In that case it assumes the pointer in question points to global memory and emits a warning. If the compiler’s assumption is true, the resulting code will work just fine. If the assumption doesn’t hold (e.g. the pointer in question points to shared memory), the resulting machine code will be incorrect. It will not behave as intended, and it could “crash”.

Support for a generic memory space on the device was introduced with compute capability 2.0. There are address conversion instructions at PTX level that can convert between specific memory spaces and generic space. With that the problem associated with the warning you are seeing went away, as the hardware now supports what C/C++ expect.

Interesting. I have a GTX 460 which is supposed to be compute capability 2.1 but I have to specifically compile with the architecture flag “-arch sm_21” and then the warning goes away and the code performs as it should through cuda-memcheck, which does not track memory leaks unless I’m not running it with the right flags. All I type is “cuda-memcheck ./qtree”, where qtree.cu is the name of my file.

I’ve also noticed when running a different example which uses the AtomicAdd() function where again I have to specifically compile with the correct architecture.

Is this normal? And how do I actually enable the detection of leaking memory blocks? I get that the OS is supposed to clear memory after the execution of the program but still…

As the documentation points out, nvcc defaults to an architecture target of sm_10. In your build system, you ould want to all target architectures for which you wish to compile the code when you build. With CUDA it is common to build an executable for multiple architectures (known as a “fat binary”) since the various GPU architectures are not binary compatible.

Various features of cuda-memcheck require hardware support only present on more recent architectures (i.e. compute capabilities). If there is such a restriction, the documentation should point out the minimum architecture at which that specific feature is supported.