SEG FAULT on any function call for large data sets

I have a cfd code which runs fine for grids of up to 32x32x32 cells. I am trying to analyze the efficiency of the cuda code for larger grid sizes. If I bump it up to 40x32x32 cells, the code seg faults at any function call, including functions like sin(). cuda-memcheck shows now errors, and all arrays are defined the same way for the small and large grids. Simple example:


int Ni=32;
int Nj=32;
int Nk=32;
int NC = NiNjNk;
double rho[NC];
.
.
^^works fine



int Ni=40;
int Nj=32;
int Nk=32;
int NC = NiNjNk;
double rho[NC];
.
.
^^segfaults at the first function fall


I have left out the rest of the code because the only change is the size of the grid, which makes me think memory access issue. A bt from cuda-gdb shows a failure at the first function call, and if I eliminate that function call (it can be replaced with an int), the seg fault happens at the next function call.

Why would an increased data set size cause the code to fail??
Thanks.

By the way I am using cuda 4.0.17 on a tesla 2070 running on ubuntu 10.04