Error during Kernel execution LU-substitution with one thread

Hello, i’ve got some troubles with following Kernel:

__global__ void LUSubstitutionKernel (vbreal** A, int n, const int* indices, const vbreal* b, vbreal* x, int iFirstVertex)

{

	int idx = blockDim.x * blockIdx.x + threadIdx.x;

	if (idx == 0) 

	{

		for (int i = 0; i < n; i++) // Forward Substitution

		{

			vbreal sum = b[iFirstVertex * 3 + indices[i]];

			x[iFirstVertex * 3 + i] = b[iFirstVertex * 3 + indices[i]];

			for (int j = 0; j < i; j++)

			{

				sum -= A[i][j] * x[iFirstVertex * 3 + j];

			}

			x[iFirstVertex * 3 + i] = sum;

		}

		for (int i = n - 1; i >= 0; i--) // Backsubstitution

		{

			vbreal sum = x[iFirstVertex * 3 + i];

			for (int j = i + 1; j < n; j++)

			{

				sum -= A[i][j] * x[iFirstVertex * 3 + j];

			}

			x[iFirstVertex * 3 + i] = sum / A[i][i];

		}

	}

}

The error seems to appear when i want to assign any “A[a][b]”-value to “x[iFirstVertex * 3 + i]” (also by using x[…] = sum; where sum includes some values of A).

After the kernel execution it says “unknown error”:

LUSubstitutionKernel<<< 1, 1 >>> (dev_A0, iNumVertices[iNumLevels-1]*3, dev_indices, dev_b, dev_u, iFirstVertex[iNumLevels-1]);

err = cudaThreadSynchronize();

printf("%s \n", cudaGetErrorString(err));

“A”, “indices”, “b” and “x” are all in device memory and i can assure you that all of them are declared properly. Might this be a bug in the driver or is something wrong with my code?

What kind of bug might it be and how can i avoid it?

Additional maybe useful information:

By outcommenting the lines:

“x[iFirstVertex * 3 + i] = sum;”

and

“x[iFirstVertex * 3 + i] = sum / A[i][i];”

This code works without errors, however does not fullfill its tasks.

In the case, anybody is interested:
I solved the problem by using a one-dimensional array for “A” instead of a two-dimensional.

That means: “vbreal* A” instead of “vbreal** A”. That seems to work fine.