How to Copy an Array to the GPU memory

Admirer4 · June 18, 2008, 10:39am

hey,
I want to ask you how can i copy an array that resides in the memory host to the constant or global memory on the device ?

in other words: i have the code:

#define N 10
//constant array on the device memory
constant device int B[N][N];

// kernel which uses B
global void Test(…)
…

//the test program
main(){
int A[N][N];
//putting some random values in A matrix
Build(A);
// before i run the kernel , i have to copy the same values in A matrix
// to the B , how can i do that ?

Sibi_A · June 18, 2008, 11:37am

The code given below is simple enough to solve your problems. If you are not able to understand anything please reply. (Areas specific to your questions are highlighted)

Given example uses global memory.

Be cautious while using “constant” memory. It has following characteristics (See NVIDIA CUDA Programming Guide)

It is limited to 64KB
constant variables cannot be assigned to from the device, only from host through host runtime functions.

=============================================

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <cutil.h>

#define MAX_COL 2
#define MAX_ROW 8

#define BLOCK_DIM 2

global void Sample_Kernel(unsigned int *d_a, unsigned int *d_b)
{
int iy = blockDim.y * blockIdx.y + threadIdx.y;
int ix = blockDim.x * blockIdx.x + threadIdx.x;
int idx = iy * MAX_COL + ix;

d_a[idx] = d_a[idx] + d_b[idx];

}

int main(int argc, char** argv)
{
unsigned int *devPtr1;
unsigned int *devPtr2;
unsigned int data1[MAX_ROW][MAX_COL] = { 10,20, 30,40, 50,60, 70,80, 90,100, 200,300, 400,500, 600,700};
unsigned int data2[MAX_ROW][MAX_COL] = { 1,2, 3,4, 5,6, 7,8, 9,10, 11,12, 13,14, 15,16 };
int size = MAX_COL * sizeof(unsigned int) * MAX_ROW;

CUT_DEVICE_INIT();

[b]
cudaMalloc((void**)&devPtr1, size);
cudaMalloc((void**)&devPtr2, size);

cudaMemcpy(devPtr1, data1, size, cudaMemcpyHostToDevice);
cudaMemcpy(devPtr2, data2, size, cudaMemcpyHostToDevice);

[/b]
unsigned int timer;
cutCreateTimer(&timer);

dim3 grid(MAX_COL/BLOCK_DIM, MAX_ROW/BLOCK_DIM, 1);
dim3 threads(BLOCK_DIM, BLOCK_DIM, 1);

cutStartTimer(timer);
Sample_Kernel<<<grid, threads>>>(devPtr1, devPtr2);
cudaThreadSynchronize();
cutStopTimer(timer);
float naiveTime = cutGetTimerValue(timer);

printf("\n\ntime taken:     %0.3f ms\n", naiveTime);

[b]
cudaMemcpy(data1, devPtr1, size, cudaMemcpyDeviceToHost);
cudaMemcpy(data2, devPtr2, size, cudaMemcpyDeviceToHost);

cudaFree(devPtr1);
cudaFree(devPtr2);

[/b]
printf(“\n”);

for (int i = 0; i < MAX_ROW; i++)
{
	for (int j = 0; j < MAX_COL; j++)
	{
		printf("data1[%d][%d] => %d\n", i, j, data1[i][j]);
	}

	printf("\n");
}

return 0;

}

Admirer4 · June 19, 2008, 5:26am

Thanks for your answer , i have understand it, but i have some questions relates to your code , first you have memory allocation
“cudaMalloc”
but how could you tell the compiler that you want to allocate constant or texture memory , another thing , when we do allocation do you know where the memory will be allocated by default ? (shared, constant, global …)

another question , can i do a memory cpy to given variable , as i have put in my example :

constant device int A[N][N];

can i access to that variable through the host code and modifying it ?

kristleifur · June 19, 2008, 10:01am

Re. textures -

Check out the example projects, you’ll see how texture memory is used. It’s the same, except you have an extra layer of code.

Default memory space allocs -

In host code, cudaMalloc always gives you global memory.

Without qualifiers, you always get local memory inside device kernels. (Of course, you don’t have any malloc calls in device kernels.)

Sibi_A · June 20, 2008, 6:50am

Yes, you can.
You can access constant variable from host and modify it.

use cudaMemcpyToSymbol

eg:- cudaMemcpyToSymbol(d_temp, &temp, sizeof(float), 0)

Textures uses a different syntax.

eg:-
texture tex;
cudaBindTexture(NULL, tex, temp, sizeof(float));

Samples such as “SobelFilter” in CUDA SDK will give you more insight on how to use textures.

Topic		Replies	Views
How I garantee that a array goes to cache? CUDA Programming and Performance	4	1060	December 16, 2014
2d array (_device_ or _constant_) CUDA Programming and Performance	1	559	July 15, 2013
constant and shared memoty CUDA Programming and Performance	8	9135	December 12, 2008
Passing a pointer to constant memory CUDA Programming and Performance	0	6277	January 3, 2009
passing an array to a kenel ? CUDA Programming and Performance	9	13236	June 10, 2009
How do you copy an array into constant memory? CUDA Programming and Performance	11	15676	December 17, 2010
Vars in global memory vs. Parameters in kernels CUDA Programming and Performance	3	6625	February 25, 2008
Global arrays? CUDA Programming and Performance	24	10624	August 18, 2010
copying array value of structure from host to device CUDA Programming and Performance	5	1267	December 11, 2013
copy device memory to constant memory CUDA Programming and Performance	4	12474	November 11, 2008

How to Copy an Array to the GPU memory

Related topics