How to Copy an Array to the GPU memory

I want to ask you how can i copy an array that resides in the memory host to the constant or global memory on the device ?

in other words: i have the code:

#define N 10
//constant array on the device memory
constant device int B[N][N];

// kernel which uses B
global void Test(…)

//the test program
int A[N][N];
//putting some random values in A matrix
// before i run the kernel , i have to copy the same values in A matrix
// to the B , how can i do that ?

The code given below is simple enough to solve your problems. If you are not able to understand anything please reply. (Areas specific to your questions are highlighted)

Given example uses global memory.

Be cautious while using “constant” memory. It has following characteristics (See NVIDIA CUDA Programming Guide)

  • It is limited to 64KB
  • constant variables cannot be assigned to from the device, only from host through host runtime functions.


#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <cutil.h>

#define MAX_COL 2
#define MAX_ROW 8

#define BLOCK_DIM 2

global void Sample_Kernel(unsigned int *d_a, unsigned int *d_b)
int iy = blockDim.y * blockIdx.y + threadIdx.y;
int ix = blockDim.x * blockIdx.x + threadIdx.x;
int idx = iy * MAX_COL + ix;

d_a[idx] = d_a[idx] + d_b[idx];


int main(int argc, char** argv)
unsigned int *devPtr1;
unsigned int *devPtr2;
unsigned int data1[MAX_ROW][MAX_COL] = { 10,20, 30,40, 50,60, 70,80, 90,100, 200,300, 400,500, 600,700};
unsigned int data2[MAX_ROW][MAX_COL] = { 1,2, 3,4, 5,6, 7,8, 9,10, 11,12, 13,14, 15,16 };
int size = MAX_COL * sizeof(unsigned int) * MAX_ROW;


cudaMalloc((void**)&devPtr1, size);
cudaMalloc((void**)&devPtr2, size);

cudaMemcpy(devPtr1, data1, size, cudaMemcpyHostToDevice);
cudaMemcpy(devPtr2, data2, size, cudaMemcpyHostToDevice);

unsigned int timer;

dim3 threads(BLOCK_DIM, BLOCK_DIM, 1);

Sample_Kernel<<<grid, threads>>>(devPtr1, devPtr2);
float naiveTime = cutGetTimerValue(timer);

printf("\n\ntime taken:     %0.3f ms\n", naiveTime);

cudaMemcpy(data1, devPtr1, size, cudaMemcpyDeviceToHost);
cudaMemcpy(data2, devPtr2, size, cudaMemcpyDeviceToHost);



for (int i = 0; i < MAX_ROW; i++)
	for (int j = 0; j < MAX_COL; j++)
		printf("data1[%d][%d] => %d\n", i, j, data1[i][j]);


return 0;


Thanks for your answer , i have understand it, but i have some questions relates to your code , first you have memory allocation
but how could you tell the compiler that you want to allocate constant or texture memory , another thing , when we do allocation do you know where the memory will be allocated by default ? (shared, constant, global …)

another question , can i do a memory cpy to given variable , as i have put in my example :

constant device int A[N][N];

can i access to that variable through the host code and modifying it ?

Re. textures -

Check out the example projects, you’ll see how texture memory is used. It’s the same, except you have an extra layer of code.

Default memory space allocs -

In host code, cudaMalloc always gives you global memory.

Without qualifiers, you always get local memory inside device kernels. (Of course, you don’t have any malloc calls in device kernels.)

Yes, you can.
You can access constant variable from host and modify it.

use cudaMemcpyToSymbol

eg:- cudaMemcpyToSymbol(d_temp, &temp, sizeof(float), 0)

Textures uses a different syntax.

texture tex;
cudaBindTexture(NULL, tex, temp, sizeof(float));

Samples such as “SobelFilter” in CUDA SDK will give you more insight on how to use textures.