typedef struct state {
int val;
struct state *ptr;
} state_t;
device state_t states = {
{0, &states[1]},
{0, &states[0]}
};
Having this in my code causes be.exe to stack overflow. I worked around by replacing the ptrs with int offsets, but now be.exe runs out of memory and returns an error “### Out of memory in Allocate_Block”. I don’t know if it is related to this struct or just because it is a complex kernel. Either way, I don’t have a workaround for it.
be.exe starts eating up gigs of memory with a single call to a one-line function. Inline that function and it eats almost no memory at all. If that inline function calls another simple one-line function, inline or not, it eats gigs of memory and dies again.
The problem is that these simple one-line functions write a single zero to an array declared as a local array in the topmost function. The address of the array is passed down and something can’t handle it.
Switch to malloc instead of local array and all is well.
I suspect the array (pretty big, 66x66 ints) is getting optimized into shared or registers and the whole buildchain can’t deal with it.
Nvidia, this is a pretty glaring bug for a release candidate, please fix!
Please file a bug with a repro case so our compiler team can have a look at this. Please also note whether this is a regression versus the r3.2 toolchain (i.e. code compiles fine with the r3.2 compiler, but encounters problems when the r4.0 compiler is used). Thank you for your help.