Problem with function (memory movements?)

Hi guys,

I am trying to implement a function that takes a int matrix in input, transposes it and stores the result in another matrix, converting every element to double.

Matrices are in Column-major format.

The kernel code is below:

__global__ void transposeIntToDouble( int* A, double *B, int rows, int cols) 

{

	int index = blockDim.x * blockIdx.x + threadIdx.x;

	if(index<rows*cols){

		B[index]=(double)A[(index*rows) % (rows*cols) + (index/cols)];

	}

	

}

The function call in the main:

double *myMat;

	int *myMat1;	

	cutilSafeCall( cudaMalloc( (void**) &myMat, lineNrm*rowSizerm*sizeof(double)));

	cutilSafeCall (cudaMemset(myMat, 0.0, lineNrm*rowSizerm*sizeof(double)));

	

	cutilSafeCall( cudaMalloc( (void**) &myMat1, lineNrm*rowSizerm*sizeof(int)));

	cutilSafeCall (cudaMemset(myMat1, 1, lineNrm*rowSizerm*sizeof(int)));

	

	transposeIntToDouble<<< dimGrid, dimBlock >>>(myMat1, myMat, lineNrm, rowSizerm);

	 

	double *RA_h;

	RA_h=(double *)malloc(lineNrm*rowSizerm*sizeof(double));

	

	cutilSafeCall( cudaMemcpy( RA_h, myMat, lineNrm*rowSizerm*sizeof(double),

								cudaMemcpyDeviceToHost) );

	for(int i=0; i< lineNrm*rowSizerm; i++){

		cout<< RA_h[i] << endl;

	}

I do not know why but this instead of printing (forgive me the stupid initialization, it’s just to test without wasting time in putting correct data into the matrix) an array of 1.0 values, it prints an array of 5.26354e-315, suggesting that probably something went wrong with datatypes?

Same problem arises even if in the kernel function i put B[index]=2.0; for example

Any help will be appreciated. Thanks a lot.