Putting the GPU at work

Morph208 · July 5, 2007, 9:44am

Ok.
I’ve just read all the posts from this topic today. Why NVIDIA didn’t answer? I think we are all from the same side, we all want to improve the performance. Why NVIDIA’s not helping here by confirming or correcting what it has been said?

Osiris> Indeed I convince it can interest a lot of people all of this. Your comments are pretty useful.

Simon_Green · July 5, 2007, 11:57am

I’m not quite sure how to respond to this thread, but I’ll try.

To the original poster - I agree with mfatica that we really need more information about your code in order to help you optimize it. You’re not going to get good performance on any GPU with only 1 or 2 threads. And tree searching problems like checkers are particularly difficult on a data-parallel machine like the GPU.

Regarding global memory latencies, fundamentally global memory reads are slow because they are uncached. Typically GPUs try and cover this latency by using many threads.

This is why we recommend having at least two or more thread blocks per multiprocessor. With enough blocks per multiprocessor, some of the blocks will be idle, waiting for load data to return, while one or more blocks will be executing instructions on previously loaded data.

The memory system is complicated, but as far as I’m aware the guidelines in the programming guide are correct.

We’re also working on profiling tools that should make optimizing these kind of things easier.

If you have specific questions please post them in another thread.

Topic		Replies	Views
Global memory access latency access latency is as much as 1200 cycles CUDA Programming and Performance	28	17658	August 13, 2009
Effective global memory bandwidth? CUDA Programming and Performance	17	17607	September 18, 2007
Waiting for global memory access. CUDA Programming and Performance	32	56460	January 31, 2008
Global Memoy latencies and NVIDIA cards Latency CUDA Programming and Performance	15	8877	January 11, 2008
Cuda Memory Bank layout Interleaving, Addressing, Conflicts CUDA Programming and Performance	25	61499	September 4, 2008
Newbie - Need to use shared mem? CUDA Programming and Performance	27	15029	December 17, 2008
Kepler global memory latency What is it? CUDA Programming and Performance	18	10366	April 26, 2012
Cuda program results are always zero in HW, correct in EMU? CUDA Programming and Performance	35	11309	May 23, 2010
coalesced vs. uncoalesced access why not speed-up of 16x? CUDA Programming and Performance	13	6088	October 29, 2008
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4756	June 22, 2011

Putting the GPU at work

Related topics