cuda for ati cards we need a stadard

A while back i read an article claiming that nvidia was trying to get ati on board with cuda. It really seems to me like with the new gpu physics and cuda movement there has to be some kinda of standard. ATI and Nvidia really need to somehow come together on this or the ball is never gonna start rolling just merely picked up and put down wherever they can fit. Its truelly great technology and needs to be fully implemented. So what is goin on on that front. Was Nvidia ever trying to work with ATI in offering CUDA? Does anyone know of this? If so where does it stand as of now?

AMD acquired ATI last year. I thought AMD, Intel and NVIDIA are all competing against each other to get a break-through in this market. CUDA on ATI may not just b possible – leave alone technical reasons…

I Found an article on google. This is not the one i read before but it seems to say something similer:
“NVIDIA claims it’s talking to ATI to try and get them to use CUDA too, which… well, we’ll see there, eh?”

from
[url=“http://www.eurogamer.net/article.php?article_id=155719”]http://www.eurogamer.net/article.php?article_id=155719[/url]

Also btw I knew ati was owned by amd but i still always refer to the amd gfx cards as ATI as do so many others still.

There has to be a standard. AMD and INTEL compete against each other but they both used x86 instruction code for many many years. Even though they where different ways of processing the code they still had a standard. CUDA is a code for GFX cards that needs to be standard for all GFX cards if it where to really take off. Physics on a gfx card uses cuda. It would be much trouble for developers to have to implement so many differant physic engines into their games. Still because of no standard most games are gonna use the cpu for physics. Do you see what i am saying

How about OpenCL? Maybe this will become a standard for GPU computing?

NVIDIA is trying for CUDA to become a parallel programming standard. In the next version, you can compile CUDA code into multi-threaded code for multicore CPU’s. Given the amount of universities that are apparently already teaching parallel programming with CUDA, they might pull this off & ATI will need to follow the CUDA programming model.

It seems AMD isn’t really interested in gpgpu at the moment. Stream (ati’s cuda) looks like its being put out to stud External Image Maybe something to do with intels larrabee [sic]

Not true at all. AMD’s properietary stuff is being taken out as useless, but the entire future of their company is bet on gpgpu. (Why do you think they bought ATI??)

Anyway, I did not hear about NVIDIA trying to get ATI on board CUDA. It’s very interesting. I’ve thought for some time that CUDA itself would fail if it were proprietary. Thankfully, OpenCL, the competitor that ATI and Intel are supporting, is fundamentally similar to CUDA. It has warps, shared memory, etc, all the important stuff. If it ever wins out, switching shouldn’t be a pain.

Ultimately, you also can’t forget DirectX 11. It will include “compute shaders.” I don’t know what other overhead you’ll need to use this feature (usually using DX requires quite a bit of boilerplate code), but DX is a universal, absolutely predominant standard, and may yet be a major gpgpu platform (even if it’s unix-incompatible).

What I really like about CUDA is how it balances useability. It’s C, yet it’s got nifty extensions and the compiler even automatically puts in include files. You can be up and running in minutes. All the other standards are similar because they’re both modeled after the GPU architecture paradigm, but it’d be a real shame if they worked either like traditional libraries or as new languages. I love how we simultaneously have all the syntactic sugar and C/C++ compatibility. (Disclaimer: I don’t really know how OpenCL works on the small scale.)

This sounds like if parallelism had been invented in 2007 by nvidia … :P

There might be a good reason for the “–comic” being the option to compile code for multicore on nvcc ;)

I personnaly don’t see who would seriously program multicore application using CUDA… i would be really interested to see some meaningful results using CUDA on something else than a matrix multiplication.

Another way to address the problem may be the other way round : using true standards (typically OpenMP) and make a backend that generates code for CUDA. The same problem applies to the Cell processor (IBM also tried to extend their ALF interface to multicore).

Anyway, there is no miracle, CUDA is not ready to be a generic standard for parallel programming (and there is no way it will ever be). On the other hand, i’m truly convinced that an initiative such as OpenCL must be the way to go (everyone must agree we need something unified) and nvidia must be powerful enough to influence the standard a lot …

I do agree with all.

“i would be really interested to see some meaningful results using CUDA on something else than a matrix multiplication.”

I think the new physics included in the nvidia drivers is purely CUDA based. I really am not sure but thats what I’ve been reading on forums here and there

Um, you clearly have not looked around much. A sizable number of high performance computing projects are using CUDA, often achieving 10x-40x speed improvements over optimized CPU-based implementations. Just the few I can think of off the top of my head:

These guys crammed more CUDA devices into one case than anyone else, and was able to be competitive with a 256 node cluster in the specific application of tomographical reconstruction.

25x speedup of molecular dynamics simulations using a single GPU.

Proof-of-concept code to attack WPA-PSK encryption. A $450 GTX 280 card is about 10x faster than a 2.6 GHz quad-core AMD chip.

I think you miss the point. CUDA is a runtime organized around data parallelism specifically. Many parallel problems do not map well to the data parallel style, and that’s fine. But for those that do, it is possible to build hardware which is very efficient at data parallel calculations, like a graphics card. Specialization brings efficiency which is difficult to match with a more general purpose architecture.

Of course, more general hardware, like multicore CPUs, can be applied to a broader range of problems. Tools like OpenMP are general enough to exploit the capabilities of SMP systems, but it is very hard to translate an OpenMP application to run well on graphics cards at the moment. However, CUDA applications could be translated (as has been shown in at least one paper) very straightforwardly into efficient multithreaded + SSE enabled programs. This opens up the possibility of writing a program which uses a GPU, but still has an efficient CPU fallback option.

As you mention, there are no silver bullets to parallel programming, but CUDA is very good in its narrow sphere of applicability. Something like it (hopefully OpenCL) could easily become a data-parallel programming standard.

I think you can, if you are talking about general-purpose computation rather than just things like game physics. A lot of that is done in the *nix world, or at least is expected to run there, so whatever solution there is needs not to be exclusive to Windows.

Read carefully, I said ‘a parallel programming standard’.

This is an interesting comment. do You think people have been buying TESLAs to do matrix multiplication??

Finance, Oil n gas, games, cryptography, scientific computing – so many applications use CUDA. Therez a thread in this forum that says (~) “What u guys do with CUDA?” – There r umpteen answers out there. You could go have a look.

If larrabee appears as a multi-core processor, one can just re-compile existing CUDA applications and move it to Larrabee… This is one obvious usage. But the “mcore” CPU should be seen as a fall-back option… jus in case…

btw, thereza company called “RapidMind” www.rapidmind.com who provide a platform independent way to code parallel applications which can then be selectively compiled for CELL, GPUs or MCORE.

I’m not sure this is a great idea. It seems to me that there are way too many magic numbers in CUDA for it to be broadly applicable to parallel computing hardware.

Support of CUDA on ATI has been attempted:

ATI Runs PhysX With Modified Drivers
[url=“ATI Runs PhysX With Modified Drivers | Tom's Hardware”]http://www.tomshardware.com/news/nvidia-physx-ati,5764.html[/url]

Unfortunately if you look at the ngohq.com web site, there has been no update in recent weeks, besides people asking “hey, any news?”. Either there are major obstacles, or they’re keeping radio silent until the thing is ready.

BTW for those who want to try it out Rapidmind is a pretty decent platform. For commercial products though it can be pretty expensive, the run-time license costs are pretty high for low numbers.

(edit: removed duplicate post)

A lot of you guys got all excited by someone saying “CUDA sucks” and you all rushed off with “nuh-ah!”

That person was talking about using CUDA a general framework, for programming CPUs as well as GPUs. That itself is actually an interesting and rich discussion. I would say that this approach actually makes a lot of sense (ie, it’s not just a “fallback option”). The way CUDA is designed, being restrictive yet elegant, it is easy to transfer and run very efficiently on a CPU.* It runs very efficiently on a GPU, a CPU, and the Cell (whose scratchpad it fully exploits). In contrast, an “automagical” solution struggles to run effiiciently on anything but CPUs. This is a big argument for making CUDA a universal paradigm. (People just need to quit assuming some of the parameters, and input smem size, warp size, etc. at runtime or compile time).

  • CUDA on CPUs can potentially be even MORE efficient than a pure-C implementation (ie, no SSE intrinsics/assembly) if the the SIMD paradigm of SSE can be swizzled into the SIMT paradigm of GPUs, which is fundamentally superior.

For me this is the most important in my report about CUDA. For me the most remarkable experience (besides really fast simulations and such :D ) is the fact that CUDA turned out to be

  • much easier than I feared (and I think that is in a large part because of some of the restrictions)

  • much more flexible than I anticipated (given these restrictions)

Given the above I look forward to programming for multicore CPUs by means of CUDA, as I personally do not see myself writing multi-threaded code.

There’s the rub though. To get optimal performance from CUDA, you need to cater to the magic numbers. For GPUs, this is a great strength (you know about them and have control over them) and a great weakness (it takes a lot of effort to get it all right and it’s difficult to re-use your code for something else). If you’re going to advocate being parameter agnostic, then I would advocate a higher-level of abstraction that hides these details entirely, like that BSGPU programming paper from SIGGRAPH.

A lot of my kernels have a template parameter that is the blocksize. I think they will be easily re-used. Which other magic numbers are you talking about, because I think that is the only difficult parameter in this respect. All the other magic numbers are as far as I can tell adjustable in the kernel call.

But I know what you mean, I also have a few kernels that need to be rewritten when new CUDA hardware gets different limitations.