CURAND acting strangely

if i run this code ( pretty similar to sample in curand documentation ) :

global void test_kernel ( curandState *globalRand, float *randoms, int N_V )
{
//global index
int id = blockIdx.x * blockDim.x + threadIdx.x;

if ( id >= N_V )
	return;

curandState local_state = globalRand[ id ]; -> i intend to launch kernel multiple times so i preserve state

randoms[ id ] = curand_uniform ( &local_state );

globalRand[ id ] = local_state; -> with this line commented everything works fine

}

//BLOK_SIZE is 1024 → running on gtx 570

block_num = N_V / BLOCK_SIZE + 1;

test_kernel<<<block_num, BLOK_SIZE>>>( globalRand, d_randomi, N_V );

but, even if i run kernel ONLY ONCE or more times for big N_V ( about 3000 and higher ) i get output ( when i printf randoms in host after kernel finished ) which is decreasing to 0 and after that getting negative values

I would appreciate any help,
using 64 bit windows 7, VS Pro, CUDA 3.2, 570 GTX

sorry, accidentaly posted here in wrong topic