Multi kernels

artemon · June 28, 2007, 8:03am

Hi,

How many kernels I can execute in some time?

Simon_Green · June 28, 2007, 1:10pm

Only one kernel can execute on the machine at a time, see the FAQ Q29:
[url=“http://forums.nvidia.com/index.php?showtopic=36286”]http://forums.nvidia.com/index.php?showtopic=36286[/url]

MGN · June 29, 2007, 12:57am

Then that means “Asynchronous Launches”? In fact, kernels can’t run in parallel and CPU usage at the moment of the kernel execution is 100%.

artemon · June 30, 2007, 2:00pm

Why nobody from Nvidia developers, can’t replay on this simple question?

Simon_Green · July 1, 2007, 6:04pm

Sorry, I didn’t quite understand your question.

Yes, kernel launches are asynchronous in CUDA 1.0 - control is returned to the application as soon as the kernel is launched.

CPU usage should not be 100% during kernel execution.

As I mentioned earlier, multiple kernels cannot execute in parallel.

MGN · July 2, 2007, 2:09am

Thanks.

All my tests shows 100% CPU usage. Can you give an example of the kernel (as working program) which unload a host? May be I doing something wrong?

yk_cadcg · July 2, 2007, 6:21am

you’re not using CUDA 0.9 and after, are you? if you do, you should add

cudaThreadSynchronize() before and after the kernel-lauching line. Please refer to sdk samples.

cudaMemcpy(d_buffinD,h_buffinH,DATASIZE, cudaMemcpyHostToDevice);

	for (int i = 0; i<100; i++)

  ProcessDeviceSuccess<<<BLOCK_N, THREAD_N, ELEMENT_N*ALIGN*THREAD_N>>>(d_buffoutD,d_buffinD);

	cudaMemcpy(h_buffD,d_buffoutD,DATASIZE, cudaMemcpyDeviceToHost);

MGN · July 2, 2007, 7:36am

Thanks, yk_cadcg.

I already tried to use cudaThreadSynchronize(), but it does not influence on CPU

occupancy. Can you modify my test (or provide your simple test) to unload CPU?

Jeroen · July 2, 2007, 8:14am

Kernel launches are asynchronous, but memory copies are not asynchronous. So in your example, you launch the kernel, followed by a memory copy which is blocking and causes the high CPU load you noticed. As mentioned in other topics, there is no way to see if a kernel launch is finished. The only thing you can do is sleep your process for a specified time duration to unload the CPU if you have a rough idea of the execution time of the kernel…

yk_cadcg · July 2, 2007, 9:09am

exactly.

to be more clear:

MyKernel<<<Dg, Db, Ns>>>(args1);

DoOtherThings(args2); //as long as args2 isn't dependent to the output of myKernel, the CUDA 0.9's "asyncronize kernels" feature could assure that the CPU could immediatelly DoOtherThings without hanging up to wait for the output of MyKernel. As we know, the CUDA 0.8 or older versions locks CPU to wait for the end of MyKernel.

MGN · July 2, 2007, 11:05am

Yes. But in my examel “TempFunc()” takes all processor time (it process only one variable in register). I tried to comment all memory copy functions but… CPU usage is 100%. Can you give a simpel exampel which unload CPU? I only heard that it is possible. I very thant to see program which do this. :) (Attach this code, please.)

Many thanks.

nathanbp · July 2, 2007, 3:40pm

Your code calls many kernels one right after the other. I’m pretty sure that calling another kernel will block until the first one returns, apparently at 100% CPU usage.

weigo · July 2, 2007, 6:32pm

It seems asynchronous kernel launching is quite confusing. IMO the same results can be achieved smartly by doing a simple synchronous kernel launch, but launching another host thread. The operating system won’t consume 100% CPU time while doing some thread synchronization. Furthermore you have all coding options like polling and other.

MGN · July 3, 2007, 3:09am

The Kernel execution time more than 200ms. It is so long and enough to unload CPU between kernel launches (if it is possible).

Who can exactly answer on the question: - Is it really possible to unload CPU when kernel is runing (not synchronized)? (Yes/No). If Yes then it is intersting to see a code. I suppose - NO! But I can’t understand my colleagues from this forum - here is no exactly answer (only suggestion).

Topic		Replies	Views
CUDA 1.0 Asynchronous Launches CUDA Programming and Performance	10	9480	June 29, 2007
Running a kernel blocks the CPU? Is it possible to run it asynchronously? CUDA Programming and Performance	2	3503	April 21, 2009
CPU load when kernel is running why 100%? CUDA Programming and Performance	14	8281	December 22, 2008
Multiple kernels in flight? CUDA Programming and Performance	19	26910	August 28, 2007
Waiting for particular kernel CUDA Programming and Performance	1	2729	September 11, 2007
Does kernel execution still block one CPU? CUDA Programming and Performance	4	10345	October 26, 2007
Kernel execution blocks CPU code CUDA Programming and Performance	9	3991	September 8, 2009
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20207	May 4, 2007
Overlapping GPU and CPU computation? CUDA Programming and Performance	9	1265	November 19, 2010
cudaMemcpy during kernel execution asynchronous kernel launch CUDA Programming and Performance	2	3101	July 20, 2007

Multi kernels

Related topics