Hi,
imagine you have 2 large Arrays, likely 10000 elements each.
I have to do something like this:
__global__ void kernel(array a, array b)
{
int tid = blockIdx.x*blockDim.x+threadIdx.x;
if (tid<a.length)
{
if (a.values[tid] ... meets condition)
{
for (int i=0;i<b.length; i++)
{
...
}
}
}
}
I can’t start a 2 Dimensional grid with 100000000 Threads, the system would freeze.
The best thing would be if i could start a kernel instead of doing the second for-loop, but thats not possible currently.
What would be your suggestion to solve this?