few questions

Miki · January 22, 2009, 8:15am

Hello,

few weeks ago I have started reading and coding CUDA.
few issues bother me , since I couldnt find answer from reading.

can I run few Kernels simultaneously?
since coputation done on the device(graphic card) and graphic card also used for display
then what we have to do in order the display wont jam while calculation ?
how can I control at which Multi Processor my kernel will run?
can two threads read from the same var/register/memory(global/shared) simultaneously?
what is unrool , why we need it?
how we can solve Bank Conflicts ?
half-warp = ?in details, how to control it?

Thanks
Miki

mybiandou · January 22, 2009, 9:09am

since coputation done on the device(graphic card) and graphic card also used for display
then what we have to do in order the display wont jam while calculation ?

I think the GPU itself can deal with both the display and cuda computation well.
But it should be noticed that if your cuda program uses lots of memory, heavy display will make the cuda program faillure because display is prior to the cuda.

Tigga · January 22, 2009, 10:54am

Hi!

No. Only one kernel can execute at a time. On later cards you can do memory copies and kernel execution simultaneously.

One trick you can do to get around this is to split a kernel with an if/else so that if the thread index is greater than a certain value a completely different execution path is taken. You have to be careful not to get divergences in this switch and register usage can be a problem if one path is a lot more than the other (the first path may be unnecessarily slowed) but it’s doable.

If you accidentally write out of bounds bad things may happen to the display. If you change the memory the display requires when running a CUDA program (by doing a mode switch) you can cause problems with the CUDA. You might see slowdowns if you’re using funky GPU accelerated desktop effects.

You can’t. All the threads in one block are guaranteed to run on the same multiprocessor however.

The access restrictions are defined in the Programming Guide.

Unrolling loops allows you to reduce overhead caused by iterating through loops. Instead of iterating at runtime it does it at compile time. This can cause significant time improvements (though can make no difference) and can save registers. I’d google “loop unrolling” - there are plenty of results that explain the process.

See Programming Guide.

Miki · January 22, 2009, 11:38am

Thanks Tigga,
you help me very much.

may I ask another?

If I allocate shared memory inside/outside the kernel :

shared int s_mem[width][height];//size widthheightsizeof(int)

extern shared int s_mem;//size widthheightsizeof(int)

what is the diffrence? what is better todo? …

Thanks again
Miki

Topic		Replies	Views
Concurrently kernels running on one device CUDA Programming and Performance	17	2821	March 2, 2010
send two kernels at one time what will happen? CUDA Programming and Performance	9	3654	July 4, 2008
Simple Question, please answer! CUDA Programming and Performance	4	3688	August 14, 2009
Multiple simultaneous kernels across different streams CUDA Programming and Performance	3	4568	February 3, 2009
Multiple kernels in flight? CUDA Programming and Performance	19	26920	August 28, 2007
Scheduling Blocks on a Multi-Processor Block Scheduling on Multiprocessor CUDA Programming and Performance	11	6436	December 6, 2007
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12078	February 23, 2009
Simultaneous kernel executions not possible? Disappointing news for me CUDA Programming and Performance	7	6135	November 3, 2008
Kernels launch - parallel or serial? CUDA Programming and Performance	16	6940	January 11, 2010
Quick Question Multiple threads - single device CUDA Programming and Performance	1	3448	May 9, 2008

few questions

Related topics