Hello, part of a big project I have is actually creating some random Initial Configuration to be executed by some Cellular Automata latter on, I can create this normaly on the Host, but when I go to the GPU and start using curand, I’m having a very strange problem.
This is the CA structure I’m using (it’s very, very simple, just a 1D CA):
#define MAXLATSIZE 200
#define MAXEXECTIME 400
typedef struct ca{
char IC[MAXLATSIZE];
char lattices[MAXEXECTIME][MAXLATSIZE];
}CA;
And I’m creating using this kernel:
__device__ int dBinomial(int n, float p, curandState *s){
int x = 0;
int i;
for(i=0;i<n;i++){
if(curand_uniform(s) < p)
x++;
}
return x;
}
__global__ void createRandomIC(CA *ics, const int max){
int tid = threadIdx.x + blockIdx.x*blockDim.x;
int i;
while(tid < max){
curandState s;
curand_init(tid,0,0,&s);
for(i=0;i<MAXLATSIZE;i++)
ics[tid].IC[i] = (dBinomial(2,0.5,&s) == 0 ? '0':'1');
tid += blockDim.x*gridDim.x;
}
}
And here’s how I’m calling it:
#define CUDA_CALL(x) do { if((x) != cudaSuccess) {
printf("Error at %s:%d (%s)
",__FILE__,__LINE__,cudaGetErrorString(cudaGetLastError()));
return EXIT_FAILURE; }} while(0)
.
.
.
CA *dCAs;
omt amountCAs=10000;
size_t casMem = sizeof(CA)*amountCAs;
CUDA_CALL( cudaMalloc( (void**)&dCAs,casMem) );
createRandomIC<<<128,128>>>(dCAs,amountCAs);
CUDA_CALL(cudaDeviceSynchronize());
While it is able to create some random IC, I can only create up until 9986 random IC, afterward all the ICs ends up with 1 in it. Here’s a sample (0 is ’ ’ and 1 is ‘*’):
9986: * **** * * **** * * ** *** * * * ** * ****** ****** * * * * * * ** * * ** ** * * * * * **** * *** ** * * * * **** **** * *** * ****** * *** **
9987:********************************************************************************************************************************************************************************************************
9988:********************************************************************************************************************************************************************************************************
And I’m pretty sure that it IS passing the max parameter correctly (ie: 10000).
If someone wants to take a look at the full source code, it can be seen here http://trac.geekvault.org/cuda/browser/trunk/cu1dCA/src.
Thanks for the help.