Nested for loops crash my kernel

I’ve been playing around with different ways to solve my discrete convolution and whenever I try to use nested for loops, it automatically crashes. I made a simple kernel below which crashes if the second for loop is present.

int idx = threadIdx.x + (blockIdx.x * blockDim.x);

		if(idx < length)

		{

			for(int j = 0; j < length; j++)

			{

				for(int k = 0; k < length; k++)

				{

					outputSignalArray[k] += 2;

				}

			}

		}

Are nested for loops not allowed? If I am trying to use them, does that mean I not using CUDA correctly to begin with?

Nested for loops are certainly allowed. Can you post the full .cu file which produces the behaviour you’re seeing? Saying that it ‘crashes’ doesn’t help much.

Nested for loops are certainly allowed. Can you post the full .cu file which produces the behaviour you’re seeing? Saying that it ‘crashes’ doesn’t help much.