Cuda and ATI Stream technology diff


I like to know the difference between Nvidia Cuda and ATI Stream technology

Also which one is more matures


I dont know much about ATI Stream, but from what I read and saw it is much less mature than CUDA. You probably can judge that from how much development of libraries and even Commercial Programs is going on using CUDA compared to what is available with ATI Stream. As far as I know CUDA is also much better documented than ATI Stream.

best regards


Difference that CUDA has much more cumfortable APIs. People that are working in phisics and mathematics do not whant to know about shaders and assembler, and i am totaly agree it.

ATI Stream has much more freedom to program GPU (DMA, memory access, etc). CUDA support all decleared devices on abstract level, for Streams you depends on R6xx/R7xx family.

CUDA was made from API level, that means that implementation of APIs and implementyation of concept was restricted by user interface (APIs). Stream was made from HW level. It is not so comfortable to program, but you can gat much more (IHMO).

Depends on applications. If you need perfomance and your applicatiion contains not big amount of functions, then Stream (IHMO) is better. If you need a fast result or “label”, CUDA is better. With CUDA you will be able to write and execute simple program in 1 hour. With Stream you will spend few days to get the result with application like 'helloo world!". But, with CUDA you will get max 20% of GPU GFlops perfomance, and it will not be easy (in my experience), and with Stream you will get 97% of GFlops perfomance. Also you have to compare GPU architecture and so on (i like ATI architecture).

If you have no expirience with assembler, shaders programming etc., better to use CUDA with hope, that in next few years nVidia will improve the perfomance of CUDA.



PS. This is from my personal expirience and IMHO. :">

You can certainly get a lot more than 20% of peak performance from CUDA. I’m getting a little annoyed that you keep repeating this nonsense–BLAS performance, for example, illustrates that you’re completely wrong.

I am ready to acsept that i am wrong. Simply show me the example that i can run and calculate perfomance.

The best perfomance i’ve got from Volkovs FFT (~ 20% of total).

That’s it.

For examle, this info from nVidia:

It is not more then 50 GFlops. :whistling:

Anyway, I am waiting for examples.

PS. I’ve told nothing agains nVidia.

Large SGEMMs are capable of getting ~400 GFlops with CUBLAS 2.3.

I defenetly will try. Results i will report here.


Dimitri, I’ve pasted you the code that gets 80% of my card’s peak FLOPS for your own program that you wrote badly before. Please stop telling people this nonsense about 20%. I’m beginning to think you have an agenda to dismiss NVIDIA.