Array not getting changed.

Hello guys,

In what seems to be a simple application of cuda I calculate the elements of an array on a GPU, I get the code to run, but the answer it returns is all wrong. Specifically the code is a full wave reflectometry code (irrelavent to parallel processing) and in a certain function that is supposed to generate through a 1-D FDTD method the electric field, I am getting that all of the values of the array are zero. Below I have a code that shows the basic structure of my code which allocates memory, copies the variables over to the device, runs the kernel and then copies the result from the device to the host followed by a print statement of the value of the returned magnetic field array on the host. This isn’t the actual calculation, but rather for demonstrative purposes I set all of the elements of the array to 1 in the device code and copy the array to the host space and when I print the value of the array on the host I get zeros.

[codebox]#define nGrid 21203

#define for1ThCt 21202

#define SIZE_DOUBLE 8

#define SIZE_IDL 4

#define nBytewx 1

#define nByteBxyz nGrid

#define nByteExyz nGrid

extern “C” {

global void for1(double *ExyzG,double *BxyzG,double *wxG); //kernel prototype

void fdtd1d_yee_mult_lnx(int argc, void *argv)

.

.

.

	cudaSetDevice(0);

	double *ExyzG, *BxyzG, *wxG;

.

.

.

	cudaMalloc((void**)&ExyzG, sizeof(double)*nByteExyz);

	cudaMalloc((void**)&BxyzG, sizeof(double)*nByteBxyz);

	cudaMalloc((void**)&wxG, sizeof(double)*nBytewx);

	cudaMemcpy(ExyzG, Exyz, sizeof(double)*nByteExyz, cudaMemcpyHostToDevice);

	cudaMemcpy(BxyzG, Bxyz, sizeof(double)*nByteBxyz, cudaMemcpyHostToDevice);

	cudaMemcpy(wxG, &wx, sizeof(double)*nBytewx, cudaMemcpyHostToDevice);

	

	dim3 dimBlock(BLOCK_SIZE);

	dim3 dimGrid ( (for1ThCt/dimBlock.x) + (!(for1ThCt%dimBlock.x)?0:1) );

	for1<<<dimGrid, dimBlock>>>(ExyzG, BxyzG, wxG);

	cudaMemcpy(Bxyz, BxyzG, sizeof(double)*nByteBxyz, cudaMemcpyDeviceToHost);

. for(i=0;i<N;i++)

	{

	printf("Bxyz[%d] = % f  ", i, Bxyz[i]);

. }

.

.

} //end fdtd1d_yee_mult_lnx

global void for1(double *ExyzG,double *BxyzG,double *wxG) {

int i = blockIdx.x*blockDim.x+threadIdx.x;

    BxyzG[3*i] = 1;

        BxyzG[1+3*i] = 1;

        BxyzG[2+3*i] = 1;  

} //end for1

} //end extern “C”

[/codebox]

When I run the code the print statement will return all zeros for Bxyz. Any help would be appreciated and thank you ahead of time.