GT300 and CUDA 3.0

jaakennuste · April 15, 2009, 4:35am

I read that GT300 will be not only cooler 40nm tech, but also has new arhitecture - The GT300â€™s architecture will be based on a new form of number-crunching machinery. While todayâ€™s NVIDIA GPUs feature a SIMD (single instruction multiple data) computation mechanism, the GT300 will introduce the GPU to MIMD (multiple instructions multiple data) mechanism. This is expected to boost the computational efficiency of the GPU many-fold. The ALU cluster organization will be dynamic, pooled, and driven by a crossbar switch. Once again, NVIDIA gets to drop clock-speeds and power consumptions, while achieving greater levels of performance than current-generation GPUs.

Does it change CUDA programming a lot?

gatoatigrado · April 15, 2009, 4:43am

I’m pretty sure I’ve heard nv employees say they don’t feed speculation on future products; usually this is disallowed by a company’s NDA (disclosure of non-public material information about future products).

tmurray · April 15, 2009, 5:33am

Ask us again when we’ve announced a product. We’re not going to answer anything about anything unannounced.

E.D_Riedijk · April 15, 2009, 6:43am

I have read speculation about dynamic warp formation (that is how I remember it). Where divergent warps get split up and a new warp is made out of the same-branch threads of old warps. This might make some assumptions that we do now invalid I guess. But given the fact that it will probably be until late this year before we see any of this, I think everyone can sleep quietly at night for now ;)

YDD · April 15, 2009, 1:22pm

At a recent conference, it was prominently announced that NVIDIA Doesn’t Do Roadmaps. I suspect that this is due to the fact that GPUs are ‘trickling up’ to HPC from consumers, rather than high end hardware trickling down to consumers (as has happened with CPUs). People buying large supercomputers do so in the full knowledge that by the time their machine is fully operational, faster chips will be available. OTOH, teenaged boys (to use NVIDIA’s own description of their primary market :D ) will probably wait three months if they know a better card is coming out.

The nice part of this approach is that you don’t plan for PowerPointware (Itanium/Larrabee etc. - apologies for singling out Intel here). However, it does make drawing up plans for the future rather difficult.

sunsetquest · June 3, 2009, 3:55am

It would be nice if N-Vidia would release more info on their up coming GT300 Processor. I would like to start designing my algorithms so they fit well on the new chip. I would also focus my “cuda/ptx” learning so that I don’t learn something that is going to be outdated in a few months.

cvnguyen · June 3, 2009, 11:14am

Probably they will support slim MIMD at the GPU level and inter-multiprocessor messaging (just my expectation :-) ). However, the multiprocessor itself may be still SIMD to assure compatibility with the current architecture. The complicated MIMD multiprocessor would make the GPU become a de facto CPU and be much more expensive definitely.

jack · June 3, 2009, 1:10pm

I think you will be fine learning CUDA now on the current architectures. If anything, whatever kernels you build today will need little (if any) tweaking to also get good performance on the GT300 architecture. It is more likely that the new features there will allow CUDA to handle kernels that could not be efficiently written today due to various communication issues and such.

The hardest part of porting an app to CUDA has been, is, and will be actually parallelizing sections of your calculations. Once that is done, tuning for maximum performance on the device is relatively simple.

seibert · June 3, 2009, 1:49pm

Yeah, if you drop the CUDA terminology for things, you realize that GT200 already has MIMD features. Using Larrabee-style terms, the GT200 has 30 RISC cores, each a 32-wide SIMD processor (implemented with 8 pipelined FPUs). Moreover, each core is 24-way hyperthreaded (since each active warp can be running a different instruction). The only thing missing is a ring bus for communication between the cores, and maybe some more cache.

The software side of CUDA hides the MIMD from you, by requiring your kernel to run on the entire chip, rather than some subset of it. If NVIDIA wants to make CUDA more MIMD-friendly, they already have the mechanism to do so: CUDA streams. Currently streams are really only good for overlapping computation and memory copies. However, if you could bind a CUDA stream to a particular number of multiprocessors, you could subdivide the GPU and more easily run completely different kernels on the different multiprocessors. Then all you need is a way to insert a “join” event into two streams for synchronization, and (hardware permitting) some sort of way to quickly exchange data between streams, bypassing global memory, and you have a winner.

I have no idea if this was the long term vision behind adding streams to CUDA, but the abstraction strikes me as an easy way to grow CUDA in a MIMD direction without huge changes to the CUDA programming model. (Insert usual disclaimer regarding speculation about someone else’s software here)

jma · June 3, 2009, 2:02pm

Here is a preview of what the actual silicon will look like. Note the fast communications rails all around the edges:

[url=“http://en.wikipedia.org/wiki/File%3ADisneyland_aerial_view_in_1956.jpg”]http://en.wikipedia.org/wiki/File:Disneyla...iew_in_1956.jpg[/url]

seibert · June 3, 2009, 6:11pm

Whee! __syncthreads_with_roller_coaster()!

Geka · June 3, 2009, 9:49pm

If you want to learn something that will be around for the GT300, it’s probably better to start learning OpenCL. If you want to program your GT200 or GT200b efficiently now, learn CUDA (it will help you with OpenCL anyway).

I am surprised at the amount of people who pretend that programming efficiently with a pure SIMD style is easy. Of course, if you have a poorly written CPU code, you will see amazing performance improvements. But if you have an already well-tuned program, it will require you significant work on CUDA as well. For instance read the paper by VVolkov at Supercomputing 08 and you will see that the kind of optimizations he does are not something that take only a couple of hours to figure out.

And imho, if the GT300 was entirely MIMD, yes it would break a lot of CUDA optimizations. I really hope that the “SIMD at the lower level, with MIMD possiblities” is what is shipped in the final product.

tmurray · June 3, 2009, 9:53pm

Er, what? CUDA’s not going anywhere.

e.ping · June 4, 2009, 12:02am

SIMD is clearly not the entire story (though it IS fun to parameterise an algorithm with the SIMD width of CUDA warps and then run it on the NEC SX9)

Until I’ve actually gotten my hands on a GT300 or whatever this unannounced product will be called, I don’t care. To quote David Kirk “we don’t do roadmaps”. Speculation is futile.

Pretty much official NVIDIA slides on CL vs CUDA are here: [url=“http://www.cse.unsw.edu.au/~pls/cuda-workshop09/”]http://www.cse.unsw.edu.au/~pls/cuda-workshop09/[/url]
CL = driver API, C4CUDA = high-level
another notion is that the CUDA docs now officially use the language of “C API” vs “C++ high level API”. Might be accidental, but I’ve seen less strong evidence :)

Topic		Replies	Views
OpenCL or CUDA? CUDA Programming and Performance	16	10964	October 26, 2011
300x to 600x times faster... really? CUDA Programming and Performance	92	34444	February 8, 2010
Career in CUDA and the future of Parallel programming CUDA Programming and Performance	9	7392	August 12, 2009
Bootstrapping with OpenCL! Advice me please CUDA Programming and Performance	21	4732	July 15, 2010
What are you guys doing with cuda? just wanna find a way to go CUDA Programming and Performance	81	56098	February 7, 2013
cuda for ati cards we need a stadard CUDA Programming and Performance	27	43388	October 3, 2008
CUDA SUCKS!!! Why <block, thread> cannot be judged by itself CUDA Programming and Performance	20	8209	February 17, 2015
Most iportant from GTC Cuda on x86 hello emulation mode CUDA Programming and Performance	28	2658	September 24, 2010
Any reason to choose CUDA over OpenCL? CUDA Programming and Performance	27	26082	August 2, 2010
My CUDA Programming Lecture and Teaching of Poisson Parallel Surface reconstruction in a Summer Scho Teaching and Curriculum Support	10	9078	March 27, 2023

GT300 and CUDA 3.0

Related topics