Array copy cuda program copy array from Host to GPU

This is very simple program . I am newbie in CUDA Programming .

This program first copy the data from Host to Device and then Device to host , when I print the data copied form host to device is correct , but when I print the data copied from Device to Host is not correct …

I am giving code here also ,please tell me ,where I am doing wrong .

Total Files -> Makefile, ,

=================================== =>

[…]$ cat







void host_copy( int* h_inp, int* h_out )


            int* dev ;

            int* inp;

            int* out;

            int i;

            int j=5;







int size1=size*sizeof(int);

printf("\n value of size1=%d \n",size1);

// Allocating GPU Memory

            cudaMalloc((void**) &dev, size1);

cudaMalloc((void**) &inp, size1);

cudaMalloc((void**) &out, size1);

cudaMemcpy(dev, h_inp, size1, cudaMemcpyHostToDevice);




simple_Copy<<<size1/8, 8, size1>>>(inp, out);

cudaMemcpy(h_out, dev, size1, cudaMemcpyDeviceToHost);





int main(int argc, char** argv)


CUT_DEVICE_INIT(argc, argv);

int* h_inp=new int;

            int* h_out=new int;

host_copy( h_inp, h_out );

CUT_EXIT(argc, argv);

            return EXIT_SUCCESS;



Kernel code ( )=>

[…]$ cat


const int size=16;

global void simple_Copy(int* inp, int* out)


                    int index = blockIdx.x * blockDim.x + threadIdx.x ;

out[index] = inp[index] ;

return ;




[…]cat Makefile

EXECUTABLE := simpleCopy


#Rules and targets

include /home/chitranjan/NVIDIA_CUDA_SDK/common/


Please suggest me ,where I am doing wrong and what I have to add .

1st question: what exactly are you trying to do?

What I see:

  1. Generate data on host and allocate buffers (seems correct)

  2. Copy data from host to dev (also ok)

  3. printf the dev buffer => doesn’t work because the dev-pointer only exists on device memory, if the printf shows something it should print out garbage, because you can’t directly access device memory, only option is to copy the data back to the host.

  4. do some copying in your kernel???

you copy an uninitialized buffer (inp) to another memory location (out) which is never used again??? you kernel should work just fine but does actually do nothing

  1. copying back exactly the data you copied to the device and printing that out which should give you exactly the same data as in step 1

so: step 3 doesn’t work and step 4 doesn’t make sense… maybe you can explain a little further what you are trying to do

Algorithm to copy elements of one array to another array

Let inputArray is an integer array having N elements and copyArray is the array where we want to copy all N elements of inputArray. Size of copyArray is >= size of inputArray.

  • Using for loop, we will traverse inputArray from array from index 0 to N-1.
  • We will copy the ith(0 <= i <= N-1) element of inputArray to ith index of copyArray. copyArray[i] = inputArray[i];