Kernel call in Kernel

Hi,

imagine you have 2 large Arrays, likely 10000 elements each.

I have to do something like this:

__global__ void kernel(array a, array b)

{

     int tid = blockIdx.x*blockDim.x+threadIdx.x;

     if (tid<a.length)

     {

         if (a.values[tid] ... meets condition)

         {

              for (int i=0;i<b.length; i++)

              {

                   ...

              }

         }

     }

}

I can’t start a 2 Dimensional grid with 100000000 Threads, the system would freeze.

The best thing would be if i could start a kernel instead of doing the second for-loop, but thats not possible currently.

What would be your suggestion to solve this?

do the first in cpu you can compute condition when gpu compute
and call x time gpu
??

smart and probably also fast. (I will try it)

But there must be a way to compute both on gpu :)

The Arrays get transformed after each kernel call, so it would be nice if i did not need to copy them back to the host