Using threads in device function.


I have got a problem as:

device device_fun(float* Array1, int* Array2, int size)

for(int i=0 ;i<700 ;i++)

//some manipulation here.


I have a kernel :

global global_fun()

//body here.

//calling device functions
device_fun(Array1, Array2, size);

global_fun<<<2, 256>>>() ;

I have problem that :

  • I have problem that how we should use threadIdx.x and /or threadIdx.y and blockIdx.x etc in device function.
    I know that it will depands on organization of that problem.

    I only want to know that is any rule or trick to decide the handling of loops?

Thanks :