Problem in Kernel

Hi,
I am new to CUDA. I am passing a integer matrix during a kernel call. I need all the threads to update the same matrix depending on their thread ID. After the control is returned back to host code, I have copied data from device matrix to host matrix using cudaMemcpyDeviceToHost. But when i am printing the content of matrix, the output is only 0. I don’t know what is the problem and i am stuck at that point.

Here is the kernel code:
int LEN=6;
int MAXLEN= 100;

global void needlemanKernel( char* sequenceMatrix, int* device_protein_length, int* traceMatrix, int* scoreD)

{

const int bDim= blockDim.x;
const int bid = blockIdx.x;
const int tx = threadIdx.x;

int current_Index = bid * bDim + tx ;
int i;
if(tx<32){
const int protein_len = device_protein_length[current_Index];
for(i=0; i<=MAXLEN ; i++)
	scoreD[tx*LEN+i]=-8*i;
for(i=0; i<=LEN ; i++)
	scoreD[tx*LEN+MAXLEN+i]=-8*i;



}

}

After the control is returned back from kernel, I have copied the scoreD contents to host score matrix using:
cudaMemcpy(score, scoreD, sizeof(int)*(MAX * (LEN+1)) * (MAXLEN+1) , cudaMemcpyDeviceToHost);

But when i tried printing the scoreD values only 0’s are there in the whole matrix. I shall be very thankful if anyone can help me.

Thanks,
Sandy