Hello, i’ve got some troubles with following Kernel:

```
__global__ void LUSubstitutionKernel (vbreal** A, int n, const int* indices, const vbreal* b, vbreal* x, int iFirstVertex)
{
int idx = blockDim.x * blockIdx.x + threadIdx.x;
if (idx == 0)
{
for (int i = 0; i < n; i++) // Forward Substitution
{
vbreal sum = b[iFirstVertex * 3 + indices[i]];
x[iFirstVertex * 3 + i] = b[iFirstVertex * 3 + indices[i]];
for (int j = 0; j < i; j++)
{
sum -= A[i][j] * x[iFirstVertex * 3 + j];
}
x[iFirstVertex * 3 + i] = sum;
}
for (int i = n - 1; i >= 0; i--) // Backsubstitution
{
vbreal sum = x[iFirstVertex * 3 + i];
for (int j = i + 1; j < n; j++)
{
sum -= A[i][j] * x[iFirstVertex * 3 + j];
}
x[iFirstVertex * 3 + i] = sum / A[i][i];
}
}
}
```

The error seems to appear when i want to assign any “A[a][b]”-value to “x[iFirstVertex * 3 + i]” (also by using x[…] = sum; where sum includes some values of A).

After the kernel execution it says “unknown error”:

```
LUSubstitutionKernel<<< 1, 1 >>> (dev_A0, iNumVertices[iNumLevels-1]*3, dev_indices, dev_b, dev_u, iFirstVertex[iNumLevels-1]);
err = cudaThreadSynchronize();
printf("%s \n", cudaGetErrorString(err));
```

“A”, “indices”, “b” and “x” are all in device memory and i can assure you that all of them are declared properly. Might this be a bug in the driver or is something wrong with my code?

What kind of bug might it be and how can i avoid it?

Additional maybe useful information:

By outcommenting the lines:

“x[iFirstVertex * 3 + i] = sum;”

and

“x[iFirstVertex * 3 + i] = sum / A[i][i];”

This code works without errors, however does not fullfill its tasks.