Hello.
I’m a CUDA newbie so I apologize if this is a silly question.
I’m trying to generate say 1000 random strings using the letters of the alphabet. Each “word” has to be 10 characters long.
I don’t want to deal with pitch and 2D arrays since these are nothing more than 1D arrays on the device. So here is the pseudo code for what I was thinking I’d do:
char word_array[ 1000 * 10 ]; // every block of sizeof(char)*10 memory banks 1 word.
char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; // my alphabet for picking random letters
// initialize word_array to be all '
char word_array[ 1000 * 10 ]; // every block of sizeof(char)*10 memory banks 1 word.
char alphabet = “ABCDEFGHIJKLMNOPQRSTUVWXYZ”; // my alphabet for picking random letters
// initialize word_array to be all ‘\0’
// initialize random number generator on GPU
// allocate GPU memory for word_array
// allocate GPU memory for alphabet array
// copy word_array to device
// copy alphabet to device
// execute kernel
// copy word_array back to host
// print word_array contents (each 10 characters and then a new line)
'
// initialize random number generator on GPU
// allocate GPU memory for word_array
// allocate GPU memory for alphabet array
// copy word_array to device
// copy alphabet to device
// execute kernel
// copy word_array back to host
// print word_array contents (each 10 characters and then a new line)
My goal is to have each thread of the GPU generate it’s own random word and store it in one of the “word banks” in the word_array.
I’ve got the code to generate 1 random word on 1 thread, but I can’t scale this up. I’m guessing I’ve screwed up the thread indexing somehow. Can anyone point me in the right direction?
Here is my kernel that generates the random word:
// The main kernel
__global__ void genRandomPep(curandState *dev_state, char *dev_alphabet, char *dev_words) {
int tid = threadIdx.x + (blockIdx.x * blockDim.x); // this is wrong for what I'm trying to do I think
int i,j;
curandState localState = dev_state[tid];
for(i = tid; i < 11; i++) { //iterate over a block of memory covering 10 char's
j = curand( &localState ) % N; // get random number between 0 and 25
dev_words[i] = dev_alphabet[j];
}
__syncthreads(); // do I need this?
}
I’m working on RHEL 5.0 (64bit) with a Geforce GTS 250 (1GB) if that matters.
Thanks in advance for any and all help.