Thread Id as loop condition Using thread Id as an upper bound within a for loop causes crash

Hi Guys,

I have been working on a matrix vector multiplication routine. Due to the properties of the matrix I can split the matrix up into its diagonal and one offset of the diagonal (All entries in a particular column are the same apart from the diagonal). These are stored as 1 d arrays within constant memory. I then multiply these by an array which is stored in shared memory to give me my full matrix vector multiplication. I initially wrote the following code to solve this.

[codebox]static device void MatrixVecMult (float *z, float *x) {

int tid = threadIdx.x;

    int i;

x[tid] = L_d[tid] * z[tid];

for (i = 0; i < tid; i++)

            x[tid] +=  L[i] * z[i];

}[/codebox]

where z and x are declared in shared memory and are of size blockDim and L and L_d are the offsets and the diagonals respectively and are sized in the same way. This code does not work for a thread number over 32. This suggests to me something to do with warps etc. but I am not sure. The following code performs the exact same operations yet it works correctly and is stable.

[codebox]static device void MatrixVecMult (float *z, float *x) {

int tid = threadIdx.x;

    int i;

x[tid] = L_d[tid] * z[tid];

for (i = 0; i < blockDim.x; i++) {

            if( i == tid)

                    break;

            x[tid] +=  L[i] * z[i];

    }

}[/codebox]

As you can see the algorithms are the same. Is there a technical issue I am missing here? Something to do with warps and memory access? Is it a bad idea to have the thread ID as a loop condition. Any help on this would be much appreciated as I am quite at a loss as to why one works and one does not.

Further info. Running on Mac OSx, GeForce GT120, compute capability 1.1, fully updated cuda drivers, toolkit etc.

Thanks in advance

Wrong forum perhaps. Reposted in the CUDA programming and Development. Apologies

Wrong forum perhaps. Reposted in the CUDA programming and Development. Apologies