Having trouble with curand.

Hello, part of a big project I have is actually creating some random Initial Configuration to be executed by some Cellular Automata latter on, I can create this normaly on the Host, but when I go to the GPU and start using curand, I’m having a very strange problem.
This is the CA structure I’m using (it’s very, very simple, just a 1D CA):

#define MAXLATSIZE 200
#define MAXEXECTIME 400

typedef struct ca{
  char IC[MAXLATSIZE];
  char lattices[MAXEXECTIME][MAXLATSIZE];
}CA;

And I’m creating using this kernel:

__device__ int dBinomial(int n, float p, curandState *s){
  int x = 0;
  int i;
  for(i=0;i<n;i++){
    if(curand_uniform(s) < p)
      x++;
  }
  return x;
}

__global__ void createRandomIC(CA *ics, const int max){
  int tid = threadIdx.x + blockIdx.x*blockDim.x;
  int i;
  while(tid < max){
    curandState s;
    curand_init(tid,0,0,&s);
    for(i=0;i<MAXLATSIZE;i++)
      ics[tid].IC[i] = (dBinomial(2,0.5,&s) == 0 ? '0':'1');
    tid += blockDim.x*gridDim.x;
  }
}

And here’s how I’m calling it:

#define CUDA_CALL(x) do { if((x) != cudaSuccess) { 
	    printf("Error at %s:%d (%s)
",__FILE__,__LINE__,cudaGetErrorString(cudaGetLastError())); 
	    return EXIT_FAILURE; }} while(0)
.
.
.
  CA *dCAs;
  omt amountCAs=10000;
  size_t casMem = sizeof(CA)*amountCAs;
  CUDA_CALL( cudaMalloc( (void**)&dCAs,casMem) );
  createRandomIC<<<128,128>>>(dCAs,amountCAs);
  CUDA_CALL(cudaDeviceSynchronize());

While it is able to create some random IC, I can only create up until 9986 random IC, afterward all the ICs ends up with 1 in it. Here’s a sample (0 is ’ ’ and 1 is ‘*’):
9986: * **** * * **** * * ** *** * * * ** * ****** ****** * * * * * * ** * * ** ** * * * * * **** * *** ** * * * * **** **** * *** * ****** * *** **
9987:********************************************************************************************************************************************************************************************************
9988:********************************************************************************************************************************************************************************************************
And I’m pretty sure that it IS passing the max parameter correctly (ie: 10000).
If someone wants to take a look at the full source code, it can be seen here http://trac.geekvault.org/cuda/browser/trunk/cu1dCA/src.
Thanks for the help.

Ok, just found the bug, I think I need to sleep more.
The problem wasn’t at all at the random part, but where I’m actually storring the output. I simply neede to put it under lattices[0], instead of IC.

Thanks anyway!

Note that the use of “char” for small integers may be risky. Depending on the system, “char” can either mean “signed char” or “unsigned char”, and this can lead to hard to track bugs. It is usually best to be explicit when using “char” to store small integers. “char” for the storage of ASCII text is fine, of course.

Thanks njuffa, I’ll keep that in mind (in this case, think it really is better to use unsigned char then).