How to use just the GPU cores without any threading

drsadeq · March 19, 2010, 7:59am

How to use just the GPU cores without any threading. i have an algorithm that needs to be run “really simultaneously”, not using threads. i want to run it on different cores of GPU without having more than one program on one node. so how can i execute this kernel to be able to run on different nodes but single thread on each core. what should be the grid size and block size?

Thanks in advance,
Sadegh

avidday · March 19, 2010, 8:30am

Threading is the basic premise of the CUDA programming model. You can’t have CUDA without threads. But that doesn’t imply the CUDA threading model is like multithreading on an SMP computer. I think you need to go back and re-read the section of the programming guide that describes the execution model, because the question you are asking doesn’t make much sense: there is no concept of “nodes” or “cores” in the CUDA programming model, so it isn’t obvious what you are asking about.

laughingrice · March 24, 2010, 12:47am

It doesn’t sound like your algorithm is very cuda friendly, but in any case

Each block is scheduled to one multicore, i.e it can’t be split. Different blocks are spread between the multicore. If you don’t create more blocks than multicores then each block should go to it’s own mulitcore (i.e 30 blocks on a gtx285 for example). Problem is that different cores don’t run really simultaneously, they are independent. A warp, or half warp, or 8 threads, depending on what you are trying to do will run “really simultaneously” unless they are serialized due to memory access patterns or conditionals.

If you set 1 thread per block you will get 1 thread per multicore (very wasteful as you will actually be running 16 but dumping 15 results), setting the same number of blocks as sm will get you one block per sm, but doing this means no latency hiding, no very wasteful resource usage etc. and is probably not going to perform in way that justifies GPU (my guess is that it will be worse than a single core on the CPU)

You would need to better explain what you need so that we can see if it’s possible to help you or if you actually need a single core CPU to run the algorithm in a serial manner.

Topic		Replies	Views
How can I use single GPU with pthread CUDA Programming and Performance	0	455	September 24, 2018
Mapping between CUDA cores and threads CUDA Programming and Performance	7	15439	December 2, 2011
blocks vs threads and bad CUDA performance CUDA Programming and Performance	3	3558	January 23, 2015
code examples: using CPU threads can I see code for any apps using Pthreads on CPU? CUDA Programming and Performance	3	1130	June 9, 2010
Designing a CUDA algo question Sort of a newbie question.... CUDA Programming and Performance	2	2373	December 9, 2011
Synchronizing Blocks CUDA Programming and Performance	3	2479	January 10, 2018
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5607	October 20, 2009
Using <<<...>>> CUDA Programming and Performance	6	2483	June 19, 2011
Help using single GPU among multithreaded CPU CUDA Programming and Performance	4	1281	October 18, 2013
How to limit number of CUDA Cores CUDA Programming and Performance	7	6138	April 22, 2016

How to use just the GPU cores without any threading

Related topics