correct CUDA kernel invokation

Hello all,

I have recently started developing on CUDA and I would appreciate some advice in the following



In the host memory I have this specific array:

(elements are unimportant)

const int a=

{ ‘h’,‘e’,‘l’,‘l’,‘o’,’ \0’ ,

‘h’,‘o’,‘w’,’\0’ ,

‘A’,‘R’,‘E’,’\0’ ,

‘Y’,‘O’,‘U’,’\0’ ,


The above element pretty much contains certain (4 in number) strings(hello, how , ARE, YOU) that are separated by ‘\0’


The array is copied to CUDA memory


The array is processed the following way:

Thread 1:

functionA ( [‘h’,‘e’,‘l’,‘l’,‘o’])

A certain function processes a part of the array so the threading needs to

operate on chunks of array elements not just one array element at a time,

Thread 2:


Thread 3


…and so on.

So in general each CUDA thread runs functionA which operates on a small chunk of the array at a time.

My QUESTION is: how should I invoke the CUDA kernel? Should I use blocks for every input that is processes by functionA?

Any help/ideas will be really appreciated.

Thank you all in advance.