Anyone wants to code a PTX backend to Brook+ ? AMD released Brook+ as open source

It appears like there is now an open sourced tool chain desperately waiting for someone to add support for nVidia GPUs ;)

Found this on GPGPU.org:

Wouldn’t it be great to have a larger choice in APIs to program the nVidia cards with? I have no idea how complex it would be to add a PTX backend to this puppy, but it might be worthwhile.

Christian

1

It’ll be great to have CUDA working on ATI cards :).

And I see no point at all to support dead-born extension (aka Brook+) which is currently abandoned by ATI itself…

I dont think it is about CUDA on ATI cards…

Instead it will be Brooks+ on NVIDIA cards.

Brooks+ is good. No doubt about it. BUt it has its own limitations. It is extremely data-parallel oriented.

We will be better off with the CUDA thing than brooks – My personal opinion though.

I mean it’ll much better to have CUDA working on ATI GPUs rather than Brook+ on nVidia ones. But both these things won’t happen.

As ATI finally made OpenCL support (silly enough it’s for CPU only atm) and released Brook+ as open source I really really doubt they’ll have enough resources to support both. It’s more looks like they’ll focus on OpenCL only and totally abandon Brook+.

ATI GPUs hardware really nice and fast, it’s shame that they (ATI) still cannot produce any mature SDK for it…

From my point-of-view, it’s that because these GPU are much more shader-oriented and less GPGPU-oriented than nVidia’s GPU.

When I see all the problems to run classical CPU application on a nVidia GPU, I could not think about running it on an ATI at this time, and I suspect ATI may have an implementation of OpenCL ready (for Mac OS X 10.6 Snow Leopard, that will launch in september), but they hit several performance problems on GPGPU.

ATI GPUs are great for anything that is shader-oriented (photo/video processing for example) but as complexity raise, divergence of execution path raise, or need to GPU local memory (Register/Shared Memory on nVidia’s) raise, the actual ATI architecture could not compete.

Well, I’m OK with shader-oriented GPUs. Being an “assembler addicted” from 90s I’m OK to program even with such crap as CAL IL :).

Main problem is that current ATI SDK way too unstable, poorly designed, almost undocumented and official ATI stream support forum just dead. I doubt switching to OpenCL will change the way ATI produces their SDKs, so I’m expecting the same “quality” OpenCL from ATI…

I would much rather prefer Making cuda code compile down to CTM code. :)

One other thing that is much much different between CUDA-enabled GPU and ATI CTM GPU: the ATI doesn’t integrate any Shared Memory until HD 4000 series (that have 16KB per multiprocessor as nVidia’s GPU).

For me it means that albeit any GeForce 8400 may be able to run CUDA code or OpenCL code, ATI architecture never will be (with comparable level of performance) onto HD 2000 or HD 3000 series, that equipped Mac last years, or brand new PC even in 2009!

ATI may call them all “CTM” but they didn’t have the same capability, and people equipped with great ATIs video cards that are really fast in 3D but of an old generation will be totally deceived if OpenCL code run on them.
That give ATI’s OpenCL implementation less opportunities to touch the mass-market than nVidia’s

Actually it’s even worse. While nVidia’s shared memory is in fact memory with arbitrary access, ATI’s HD4000 “local data share” is a different thing. It’s possible to read from arbitrary locations like var = lds[index] but it’s impossible to write like lds[index] = var. Lds offset (=index) while writing must always be constant. This is really limiting lds usage.

Thanks for this technical information!!!

It totally explain why ATI doesn’t move on OpenCL implementation on their videocard, because Shared Memory couldn’t be trade for Global Memory on CUDA/OpenCL development, as a local data cache, a mean to exchange data between threads of the same MP, a local storage for dynamically indexed arrays, …

Ouch!