I’ve been playing around with different ways to solve my discrete convolution and whenever I try to use nested for loops, it automatically crashes. I made a simple kernel below which crashes if the second for loop is present.

```
int idx = threadIdx.x + (blockIdx.x * blockDim.x);
if(idx < length)
{
for(int j = 0; j < length; j++)
{
for(int k = 0; k < length; k++)
{
outputSignalArray[k] += 2;
}
}
}
```

Are nested for loops not allowed? If I am trying to use them, does that mean I not using CUDA correctly to begin with?