new to cuda Cuda beginner

paul_v · January 7, 2009, 4:48pm

I’m new to CUDA, can some one help to answer the question – Paul

CPU program

global void increment_gpu(float *a, float b, int N)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx < N)

a[idx] = a[idx] + b;

}

void main()

{

â€¦

dim3 dimBlock (blocksize);

dim3 dimGrid( ceil( N / (float)blocksize) );

increment_gpu<<<dimGrid, dimBlock>>>(a, b, N);

}

I have a question on CUDA parallel programming. In the above example, does â€œncrement_gpu<<<dimGrid, dimBlock>>>(a, b, N);â€ execute on all n threads in parallel in terms of CPU execution time? Do I have to load the function global void increment_gpu(float *a, float b, int N)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

if (idx < N)

a[idx] = a[idx] + b;

}

Before running the main program or does the CUDA compiler do the background work and we only need to run the main as a regular program?

Thank you.

MisterAnderson42 · January 7, 2009, 4:54pm

It runs all n threads in parallel on the GPU. The function call is asynchronous so the CPU continues executing after the kernel call.

Since you are using the CUDA runtime API, everything is done for you in the background. It is very convenient.

When using the driver API, you have to explicitly load the code and do a lot more setup. There is no reason to use the Driver API unless you really know for certain that it is what you want and have a good reason for doing so.

paul_v · January 7, 2009, 5:10pm

Thanks for your response.

What do you mean by “after the kernel call”? Do you mean kernel call completed and returned from GPU or just submitted the kernel call to GPU. If later and if the CPU continues executing, when the kernel call is going to return and how the CPU getting notified.

MisterAnderson42 · January 7, 2009, 5:27pm

After the kernel call is submitted, the CPU continues executing.

In 99% of the cases, you don’t need to be notified of when the call completes. Just keep issuing kernel calls and/or device to device memcpy calls and they will queue up and execute in order. It you issue a host ↔ device memory call, then there is an implicit synchronization as the CPU stalls in that call waiting for the GPU to complete all previously submitted tasks.

If you need to explicitly synchronize (i.e. for benchmarking purposes), see cudaThreadSynchronize() or the Event API in the programming guide.

Topic		Replies	Views
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20210	May 4, 2007
Do CPU and GPU function execute parallely? CUDA Programming and Performance	3	1219	October 21, 2009
unable to get the cpu and gpu to run in parallel CUDA Programming and Performance	34	23331	October 7, 2010
Does kernel execution still block one CPU? CUDA Programming and Performance	4	10345	October 26, 2007
why run on cpu? CUDA Programming and Performance	8	6298	December 4, 2008
Using GPU and CPU at the same time CUDA Programming and Performance	5	6986	March 4, 2009
'Computations server' application design advice CUDA Programming and Performance	24	12754	March 23, 2007
Very quick question regard aync CUDA Programming and Performance	4	2730	June 25, 2008
STATUS OF CALL Status of kernel Execution CUDA Programming and Performance	2	6945	December 17, 2007
Heterogenour programming CUDA Programming and Performance	4	1853	November 24, 2008

new to cuda Cuda beginner

Related topics