if i run this code ( pretty similar to sample in curand documentation ) :
global void test_kernel ( curandState *globalRand, float *randoms, int N_V )
{
//global index
int id = blockIdx.x * blockDim.x + threadIdx.x;
if ( id >= N_V )
return;
curandState local_state = globalRand[ id ]; -> i intend to launch kernel multiple times so i preserve state
randoms[ id ] = curand_uniform ( &local_state );
globalRand[ id ] = local_state; -> with this line commented everything works fine
}
//BLOK_SIZE is 1024 → running on gtx 570
block_num = N_V / BLOCK_SIZE + 1;
test_kernel<<<block_num, BLOK_SIZE>>>( globalRand, d_randomi, N_V );
but, even if i run kernel ONLY ONCE or more times for big N_V ( about 3000 and higher ) i get output ( when i printf randoms in host after kernel finished ) which is decreasing to 0 and after that getting negative values
I would appreciate any help,
using 64 bit windows 7, VS Pro, CUDA 3.2, 570 GTX
sorry, accidentaly posted here in wrong topic