String manipulation and Block size

Hi.

I’ve written some code to generate the permutations of strings, given a set of strings. (where one string is a set of characters of the English alphabet)

The basic strategy I follow is to assign one block per string with the number of threads in each block equal to the string length of that particular string.

I just wanted to know, as a ball park figure, what would be the maximum number of blocks generated in CUDA?
Wanted to have some theoretical front-end before I experiment with data…

In short, the number of blocks in my program is linearly dependent on the number of strings in the set of strings provided.
Would the program be able to manage an exponential number of strings?

hi, i guess the max grid dimensions i.e. number of blocks is 65536, and max no. of threads in a block is 512. You could check this up…but i guess this is it.

I have another question. Im also working with strings…i have an array of character strings which i want to send to the device. Heres what ive tried:

I have an an array of charater strings defined as:

char *a[3];

a[0]=“foo1”;

a[1]=“foo1”;

a[2]=“foo2”;

I need to copy this to device. Can somebody tell me how?

Heres wat i tried so far:

  1. cudaMalloc((void**)&a[0],5*sizeof(char));

cudaMemcpy(dev_array[0],a[0],5*sizeof(char),cudaMemcpyHostTo

Device);

//Subsequently do this for a[1], a[2]…n so on.

This works, but as you see, ive had to explicitly send in each character string one at a time. If i have char *a[1000], this is obviously impossible to do. Also calling the kernel with this big a number of pointers is impossible. Is there a way to do this???

ananth Sadanand:

Firstly, be careful–you’ve defined a character array of 3 chars, and then filled it with 3 chars (no null byte termination). So what you have is a vector of chars NOT a string (strings end in a null byte ‘\0’). That being said, if you don’t really want a set of strings ( a set of char arrays is fine) and each array is the same length, why not do:

# define NUM_ARRAYS 1000

char* dev_array = NULL;

cudaMalloc((void**)dev_array,3*sizeof(char)*NUM_ARRAYS);

char a[3*NUM_ARRAYS];

cudaMemcpy(dev_array,a,3*sizeof(char)*NUM_ARRAYS,cudaMemcpyHostToDevice);

And then access each individual array with the knowledge that dev_array[3*i] is the start of the i’th array.

Alternatively, for true strings, you could make it 4 chars long and set every fourth char to ‘\0’.