64 bit integer inconsistancy with CUDA 5.0

Hey all

I have a kernel which accepts an integer and unsigned integer arrays. In the ‘.cu’ file the kernel is written and so is it’s wrapper function.
The prototype to the wrapper function is declared in different header to enable linking to a ‘.cpp’ file
I initialized a random integer array in a ‘.cpp’ file, uploaded it to the device memory, and sent the kernel away.

// kernel.cu
__global__ void myKernel(int * arr){
...
}

cudaError_t wrapperFunc(int * arr, dim3 gridSize, dim3 blockSize, cudaStream_t stream){
   myKernel<<<gridSize,blockSize,0,stream>>>(arr)
   return cudaGetLastError();
}

// kernel.h
cudaError_t wrapperFunc(int * arr, dim3 gridSize, dim3 blockSize, cudaStream_t stream = 0)

// main.cpp
   int *arr = new int[SOME_SIZE],*garr;
   randFill(arr,SOME_SIZE);
   CUDA_SAFE(cudaMalloc(&garr,sizeof(int)*SOME_SIZE));
   CUDA_SAFE(cudaMemcpy(garr,arr,sizeof(int)*SOME_SIZE));
   wrapperFunc(arr,gridSize,blockSize);

Don’t mind the pitch allocation and size issues now because it’s not the point. When I compile in 32 bit configuration (win32) in VS2010, all is fine. However when I switch to ‘x64’ configuration, the integer allocation seems to be messed up in the ‘.cpp’ file. By messed up I mean it seems that a 64 bit integer is allocated. For instance, if I verified (in the .cpp scope) that arr[0] is 1 and arr[1] is 2, debugging using nSight will show that the kernel reads arr[0] = 1, arr[1] = 0 and arr[2] = 2. The kernel results show the same, of course (regarding the actual operation). If it is necessary I’ll upload the entire code, of course, but I think the question is really general.
Any tips regarding this matter?

Thanks,

Gadi

Do not use int variable type.

And what should I use then?

__int32