curand_uniform() problem Can't find out why kernell crashes with curand_unifor

Hello dear sirs,

I am student from Russia, working on my course work (Monte-Carlo method for solving sparse SLAE on GPU) and i can’t find out why my kernell crashes, when i use curand_uniform() function from library CURAND. May be someone can help me to understand my mistakes, I am in deep depression because I can’t understand what i am doing wrong in whole week. Here i post kernel code and host code. Deeply sorry for my English!

Kernel launche code:

devP is a pointer to 1 dimensional massive of floats; devV, devColInd, devOffsets is a pointers to 1 dimensional massive of ints. This massives (devP, devColInd, devOffsets) discribes sparse matrix in compressed raw format, devV is additional data to this matrix.

testKernel<<<1, 1>>>(devP, devV, devColInd, devOffsets, B, time(NULL), stohNum, devRes);

	cudaError_t error = cudaGetLastError();

	if(error != cudaSuccess)


		printf("Cuda error: %s\n", cudaGetErrorString(error));

		printf("Press enter to quit with -1 code\n");




When I check kernel execution with cuda error i get “unknown error”

Kernel definition code:

__global__ void testKernel(float *p, int *v, int *colInd, int *offsets, float *B, unsigned long seed, int numStoh, float *res)


	int gid = threadIdx.x;

	int counter = 0;

	float xx = 0.0, x = 0.0;

	float rNum = 0.0;

	float accSum = 0.0;

	int vv = 1;

	int curEq = 0;// kp = 0;

	int exit = 0;

	int isFound = 0;

	int root = 1;

	curandState localState;

	curand_init(seed, gid, 0, &localState);

	while(counter < numStoh)


		curEq = root;

		xx = B[root];

		vv = 1;

		exit = 0;

		while(!exit)//create Markov's chain


			//kp = k; //save previos state

			rNum = curand_uniform(&localState);//((float)rand()/((float)RAND_MAX));

			accSum = 0.0;

			isFound = 0;

			for(int j = 0; j < offsets[curEq+1] - offsets[curEq]; j++)


				if(((rNum > accSum) && (rNum < accSum + p[offsets[curEq] + j])) || (rNum == accSum))


					vv = vv*v[offsets[curEq] + j];

					curEq = colInd[offsets[curEq] +j];

					xx = xx + vv*B[curEq];

					isFound = 1;



				accSum += p[offsets[curEq] + j];


			if(isFound != 1)//particle reach the border


				exit = 1;



		x += xx;



	res[0] = x;


if I for example cooment line with curand_uniform() and replace it with rNum = 0.99 then kernel will finish execution with ok status, may there is a problem with massives but i checked it thousands times. So I have not ideas whats wrong…

Your use of the curand library looks fine to me. But I suspect an error elsewhere in your code. Actually, I faced the very same problem with mine, whereby a call to curand_uniform was triggering a crash that was latent in the code, following to a previous out of bound error…
I encourage you to check your code with cuda_memcheck to see if it spots anything of this kind.

Great thanks to your advice, I’ll try it!