Cannot tell what pointer points to, assuming globa Ran out of register

Hi all,

I face the same problem as described in this topic:

So, I have a quite big program, my kernel function calls a device function that calls 5 more device functions. Perhaps it sounds a bit strange to implement a program that big, however what I intend to do in the end is just a vector matrix multiplication, each thread multiplies one line of the matrix with the vector. But the matrix is not defined before, so that I need the device functions to calculate the line I want to multipy with the vector.

So, on the one hand I get the error:

Compiler Error in file forward_projection.cpp3.i during Register Allocation phase:

### ran out of registers in float

I followed the remondations out of the topic mentioned before, I guess the problems is the amount of variables in my device functions, so I tried to declare them as shared or constants, to help the compiler that he doesn’t put everything in register, however that doesn’t changed anything. I also thought about declaring the functions as noninlined, but that doesn’t make sense either, as the compiler will still declare them as inlined as I pass pointer parameters (mentioned in CUDA programming guide).

The second error messages I get are the following and I am not sure if it is directly related to the first problem:

forward_projection.cpp3.i(551): Advisory: Cannot tell what pointer points to, assuming global memory space

forward_projection.cpp3.i(552): Advisory: Cannot tell what pointer points to, assuming global memory space

forward_projection.cpp3.i(552): Advisory: Cannot tell what pointer points to, assuming global memory space

forward_projection.cpp3.i(556): Advisory: Cannot tell what pointer points to, assuming global memory space

forward_projection.cpp3.i(557): Advisory: Cannot tell what pointer points to, assuming global memory space

I checked the line numbers and they doesn’t make sense to me, as they refer to general functions of any type.

I declared the arrays and passed them as pointers to the parameters:

__shared__ float Z[8];

 get_z_coord(&Z[0]) //this it the device function which should calculate the values for Z[];

So is there a problem in the way I declare my arrays or the way I pass them to the function?

I hope anyone of you has an idea, I don’t know what to try more!

Thanks in advance!


Ive never encountered the first error.

For the second, just do get_z_coord(Z) and see if that helps.

I tried your proposal, but it doesn’t change the compiler message. I was searching for the variables which cause this problems and find out that it is an array of stuct.

I define the struct in an header file (.cuh) and include it in the device file. The struct definition is as follow:

typedef struct __align__(8) 


         int index; 

         float w; 


In the global function I call the device function and hand over the pointer for the array of struct

__global__ void projection(){


 __shared__ weightedVoxel S[280];

   int cnt = dev_func(S); //device function which should fill the array of struct with values


Then I fill the array in the device function

#include "dev_func.cuh"

__device__ int dev_func(weightedVoxel  S[]){






So, when I exclude the part of the device function where the array is filled, the compiler works, if not it gives the error mentioned above, “Cannot tell what pointer points to…” which finally leads to the error “ran out of register”.

I will continue searching for a solution tomorrow, but perhaps some of you have an idea what is going wrong there!

“Cannot tell what pointer points to” has been dramatically improved for 2.1. Hopefully it will solve the majority of your problems. For the time being… I think I’d need to see a lot more code before I could offer any meaningful advice.

That is a dangerous thing to say on the forum, now I have to ask: when is it coming? ;)

See, I could make some claim about when 2.1 is coming, and then my head would wind up mounted to the wall next to some of the CUDA driver guys…

hehehe, better don’t in that case :D

So, I was investigating a bit more about my problem. I made a second program to reconstruct the same error, however passing of the pointer of the array of structs worked fine, so I went back to the complexer program to search for other reasons there. As I explained before the program contains several device functions which are called several time from several functions. When I reduce the amount of function calls (in fact quitting one function, which calls 3 subfunctions is enough) the program compiles and works fine. So there seem to be a relation between the used register memory and my problem. The simple version of the program I did uses this amount of memory:

ptxas info    : Used 6 registers, 1600+0 bytes lmem, 32+24 bytes smem, 24 bytes cmem[1]

The complex version without the last function call uses this amount of memory:

ptxas info    : Used 32 registers, 56+28 bytes lmem, 2984+2980 bytes smem, 24 bytes cmem[0], 288 bytes cmem[1]

Does this makes senses? I don’t really see a relation between my error and the fact that I could reach the limit of the register memory?

In any case these problem just occur if I compile in release mode for the GPU in emulation mode the version works fine. I guess that the emulation compiler doesn’t check at all possible register memory occupancy.

By the way, another doubt about compilation time; my compilation time in release mode is really high, for the complex program it can get easilly up to 10min or more, whereby the emulation compilation just needs 10-15 seconds, is this normal?

So, I keep on searching for solutions, I am happy about any hint, what I could change!


It sounds like you’re using the maxrregcount flag and limiting regs to 32. If your code uses much more than 32 regs, it will cause the compiler to become very slow. Perhaps it’s also causing your error.

Re: “cannot tell what pointer points to.” I’ve encountered this error often. The code you posted doesn’t look like it would trigger it. Strange. Try moving “shared weightedVoxel S[280];” into file scope? Perhaps align is also the culprit? You don’t really need align.

I haven’t used the maxregcount, I tried it once and put the value to 128, the max value I read, but it doesn’t change anything.

I used the align command according to the programming guide to avoid several read instructions, I will try if there is a difference if I take it out.

However, I found a way to avoid the problems I had, I seperated my kernel program in 3 kernels and saved the results of the first two kernels in an array in global memory which I then read with the 3rd kernel. first I wanted to avoid this step using the global memory as I need the results just once.

By the way, I haven’t changed anything on my pointers and on the device functions, so somehow the problem came from the size of the programs and the used variables.

What I am still wondering about is the compiler time. the compilation of the 3 kernels with all the device functions takes up to 20 min now, I thinks that’s a lot, and I don’t specifiy any paramaters!

Thanks for the support!