problem with cudaMemcpy

Hi all,

I am having this strange problem all of a sudden in my cuda program. The data is not being transferred to the GPU by cudaMemcpy. It returns the value 0.
Here is the code:
C:
[indent]size=46214;
int T=(int ) malloc(3size22sizeof(int));
int block_size,grid_size;
block_size=512;
grid_size = (int)((size-1)/block_size)+1;
cudaError_t memA, memB;

memA=cudaMalloc((void **)&T1_d,(3size2BUCKETSIZE)sizeof(int));
memB= cudaMemcpy(T1_d,T,(3
size
2*BUCKETSIZE)*sizeof(int),cudaMemc
pyHostToDevice);
int i=0; // for loop here
tryf<<<grid_size,block_size>>>(T1_d,R_d,i,size); [/indent]

Kernel code :
[indent]global void tryf(int * Z, int fin,int h, int M){
int i = blockIdx.x * blockDim.x + threadIdx.x ;
if(i<M) {
int c=0;
int b1;
for(b1=0;b1<2;b1++){
int pos1=((((h
46214)+i)*2)+b1)*2+1;
fin[i]= Z[pos1];
}
}
}
[/indent]

the values of memA and memB are 0.
Also if it helps, when I ran the deviceQuery example it gave following output:

[indent]CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: “GeForce 8800 GT”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1512000
Total amount of global memory: 536674304 bytes
Number of multiprocessors: 0
Number of cores: 0
Total amount of constant memory: 1 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.07 GHz
Concurrent copy and execution: Yes

Test PASSED

Press ENTER to exit…
[/indent]

Thanx in advance . :rolleyes:

I have a small question, what is BUKETSIZE?

memA, memB = 0, it mean that your cudaMalloc and cudaMemcpy is successtly.

did you pay attention on your variable “i”.

and this code