Hello all,
I have recently started developing on CUDA and I would appreciate some advice in the following
subject:
STEP1:
In the host memory I have this specific array:
(elements are unimportant)
const int a=
{ ‘h’,‘e’,‘l’,‘l’,‘o’,’ \0’ ,
‘h’,‘o’,‘w’,‘\0’ ,
‘A’,‘R’,‘E’,‘\0’ ,
‘Y’,‘O’,‘U’,‘\0’ ,
};
The above element pretty much contains certain (4 in number) strings(hello, how , ARE, YOU) that are separated by ‘\0’
STEP 2:
The array is copied to CUDA memory
STEP 3:
The array is processed the following way:
Thread 1:
functionA ( [‘h’,‘e’,‘l’,‘l’,‘o’])
A certain function processes a part of the array so the threading needs to
operate on chunks of array elements not just one array element at a time,
Thread 2:
functionA([‘h’,‘o’,‘w’])
Thread 3
functionA([‘A’,‘R’,‘E’])
…and so on.
So in general each CUDA thread runs functionA which operates on a small chunk of the array at a time.
My QUESTION is: how should I invoke the CUDA kernel? Should I use blocks for every input that is processes by functionA?
Any help/ideas will be really appreciated.
Thank you all in advance.