Hi,

I’m using GTX 280 (Compute Capability 1.3) on Windows Vista 64 bit.

I have a strange error which I never understand. I’ll appreciate if you give any comments.

```
// in host function
// ... allocated about 150 MB global memory
float *check_array; // for debug
int size_check = 500 * sizeof(float);
cudaMalloc((void**)&check_array, size_check);
dim3 dimBlock(256,1);
myKernel<<<1, dimBlock>>>(check_array, param); // param is a parameter object.
// check the result passed from from "check_array"
```

So far, this is fine.

However, my problem occurs at:

```
__global__ void myKernel(float *check_array, Parameter *param) { // param is a set of pointers that point global memories
int na = 100;
int nd = 80;
int i, j;
int pi = 1;
int n;
float dis1;
float dis2;
float minDis;
float mcovar[3][3];
float v[3][3];
float x[3], y[3], z[3];
// The above are the only local declarations I have.
// ... some computations using "param" and above variables
/********** Error location **********/
if (threadIdx.x == 0) {
check_array[0] = mcovar[0][0];
check_array[1] = mcovar[0][1];
check_array[2] = mcovar[0][2];
check_array[3] = mcovar[1][0];
check_array[4] = mcovar[1][1];
check_array[5] = mcovar[1][2];
check_array[6] = mcovar[2][0];
check_array[7] = mcovar[2][1];
check_array[8] = mcovar[2][2];
}
__syncthreads();
}
```

In the computation part, I have no “return” statement. So I believe every thread will reach “__syncthreads()”.

If I COMMENT OUT the assignments of “check_array”, it works fine.

Even if I assign another values (e.g., 0 or 1) to “check_array”, it works fine.

But, with the above code (with mcovar), the kernel does NOT run. This mean that actully there is another assignment (for debug) of check_array, but the array does not contain the debug value in this case.

Does anybody know what the problem is?

Please help me… I spent about 2 days in this problem.

Regards,