Vars in global memory vs. Parameters in kernels

casybaby · February 24, 2008, 10:17pm

Hi,

As stated in the title, I have a question regarding vars(parameters) in global memory and kernels.

Let’s say in my kernel, I need to use a input: a_gpu. The input’s value is gotten from the host. Can I do either one of the followings to achieve this goal？

1>>

 CPU---->>>>

int a, a_gpu;

       ...

       a=0;

       ...

       cudaMalloc((void**)&a_gpu, sizeof(int));

       cudaMemcpy(a_gpu,a,sizeof(int),cudaMemcpyHostToDevice);

       ...

       Kernel<<< grid, threads >>> ( );

       ..

GPU---->>>>

__global__ void Kernel(){

          ...

          //get a_gpu value directly from global memory

          int tmp = a_gpu;

          ...

       }

2>>

  CPU---->>>>

int a, a_gpu;

       ...

       a=0;

       ...

      //Do I still need to allocate memory and copy value for the GPU side?

       cudaMalloc((void**)&a_gpu, sizeof(int));

       cudaMemcpy(a_gpu,a,sizeof(int),cudaMemcpyHostToDevice);

       ...

       Kernel<<< grid, threads >>> (a_gpu);

       ...

GPU---->>>>

__global__ void Kernel(int a_gpu){

          ...

          int tmp = a_gpu;

          ...

       }

If I have ten inputs that I need for the kernel, do I need to allocate memory for them ahead in host side manually? Please let me know if I am doing the right thing and help me to correct these code.

Thx a bunch!!

seibert · February 24, 2008, 11:57pm

The CUDA runtime API will copy a_gpu to the device for you if you call your your kernel with pass-by-value. In case #2, all you need on the CPU is:

int a_gpu;

a_gpu=0;

Kernel<<< grid, threads >>> (a_gpu);

You can do this with any parameters (including structs) that you can pass-by-value. To pass arrays to a kernel, you need to use cudaMalloc and cudaMemcpy.

Getting output back from a kernel also requires cudaMalloc/cudaMemcpy, even if you only want to return a simple type, like an int or float.

casybaby · February 25, 2008, 12:21am

The CUDA runtime API will copy a_gpu to the device for you if you call your your kernel with pass-by-value. In case #2, all you need on the CPU is:
int a_gpu;

a_gpu=0;

Kernel<<< grid, threads >>> (a_gpu);
You can do this with any parameters (including structs) that you can pass-by-value. To pass arrays to a kernel, you need to use cudaMalloc and cudaMemcpy.

Getting output back from a kernel also requires cudaMalloc/cudaMemcpy, even if you only want to return a simple type, like an int or float.

[snapback]332917[/snapback]

Thx.

But I still a little bit confused.

If I have copied a var to device, can I use this var directly without passing it by parameter?
Can I new a kernel without parameters?

MisterAnderson42 · February 25, 2008, 12:26am

Yes
Yes

However, I would not recommend using a device variable for a kernel parameter. As all threads are likely to read it simultaneously, the read will not be coalesced and the performance of your kernel will suffer a huge hit. Use a constant variable instead, they are efficient when all threads read the same memory location at once. You can initialize a constant variable from the host with cudaMemcpyToSymbol.

Topic		Replies	Views
CPU variable accessible on GPU CUDA Programming and Performance	12	9873	September 13, 2010
Storage point of Kernel parameters CUDA Programming and Performance	3	994	February 11, 2013
sending parameters to kernel CUDA Programming and Performance	1	2729	June 12, 2011
Kernel requiring large number of parameters CUDA Programming and Performance	14	8884	September 5, 2008
Kernel Variable get kernel variable CUDA Programming and Performance	5	5201	October 28, 2008
Passing variables as parameter In which memory are they stored? CUDA Programming and Performance	14	2531	August 22, 2010
Kernel with no parameters Is it possible? CUDA Programming and Performance	2	3322	December 2, 2009
Passing parameters to a cuda function - newbie question CUDA Programming and Performance	6	8715	June 28, 2013
How to read values from memories? CUDA Programming and Performance	5	3534	February 29, 2008
Memory usage within GPU CUDA Programming and Performance	2	2392	July 13, 2009

Vars in global memory vs. Parameters in kernels

Related topics