Most iportant from GTC Cuda on x86 hello emulation mode

http://www.brightsideofnews.com/news/2010/…chitecture.aspx

According to Heise News, there will be a commercial compiler for CUDA code by Portland Group (PGI) available (or officially intruduced) starting November 13th. Do you really think x86 CUDA will become part of the toolkit? I doubt that.

http://www.heise.de/newsticker/meldung/GTC…ig-1083447.html

According to Heise News, there will be a commercial compiler for CUDA code by Portland Group (PGI) available (or officially intruduced) starting November 13th. Do you really think x86 CUDA will become part of the toolkit? I doubt that.

http://www.heise.de/newsticker/meldung/GTC…ig-1083447.html

This is also no emulation mode as it was before. It is a conversion from CUDA C to x86 machine code.

This is also no emulation mode as it was before. It is a conversion from CUDA C to x86 machine code.

I’ve recently moved to 3.2… I just dont have enough words to say how much I miss emulation mode…

it was just perfect :)

eyal

I’ve recently moved to 3.2… I just dont have enough words to say how much I miss emulation mode…

it was just perfect :)

eyal

So your cuda program is run same way on x86, seems you can debug it etc on x86. You compile your code to x86, sounds like it is better than emulation mode cause it is faster and more precise.

So your cuda program is run same way on x86, seems you can debug it etc on x86. You compile your code to x86, sounds like it is better than emulation mode cause it is faster and more precise.

Maybe it will have free licence for debug purpose. I.e. if you do not distribute your program with it, just use it for debug. I think it is good solution. Those who want thier cuda program run on x86 may buy licence.

Maybe it will have free licence for debug purpose. I.e. if you do not distribute your program with it, just use it for debug. I think it is good solution. Those who want thier cuda program run on x86 may buy licence.

Are there any details known yet? For example, are they going to “loosely” integrate an x86 processor to the “GPU” (not sure what it will be called then), having direct access to the Interconnection Network along with the TPC’s? Or, are they going to tightly integrate x86 by somehow replacing the SP’s in the TPC’s with x86-like stream processors, replacing or adding to PTX with x86?

Are there any details known yet? For example, are they going to “loosely” integrate an x86 processor to the “GPU” (not sure what it will be called then), having direct access to the Interconnection Network along with the TPC’s? Or, are they going to tightly integrate x86 by somehow replacing the SP’s in the TPC’s with x86-like stream processors, replacing or adding to PTX with x86?

No, they will just compile CUDA code to x86 binaries. It will not run on a gpu, but on the cpu.

No, they will just compile CUDA code to x86 binaries. It will not run on a gpu, but on the cpu.

I agree with E.D. Riedijk. This is almost certainly a commercially supported compiler that does what many other academic projects have been dabbling in for years: Take CUDA source code and generate multithreaded SSE x86 code. If done well, I bet a lot of people would find that CUDA on x86 is faster than even their normal CPU implementations. (Because most compilers are terrible at generating SSE instructions from all but the simplest C code and most of us are terrible at writing SSE by hand.)

I agree with E.D. Riedijk. This is almost certainly a commercially supported compiler that does what many other academic projects have been dabbling in for years: Take CUDA source code and generate multithreaded SSE x86 code. If done well, I bet a lot of people would find that CUDA on x86 is faster than even their normal CPU implementations. (Because most compilers are terrible at generating SSE instructions from all but the simplest C code and most of us are terrible at writing SSE by hand.)

And it might make OpenCL a lot less attractive from a hybrid computing perspective. It will be very interesting to see the performance difference between OpenCL on multicore processors and CUDA code compiled with this compiler, and also the amount of tweaking required for both.

And it might make OpenCL a lot less attractive from a hybrid computing perspective. It will be very interesting to see the performance difference between OpenCL on multicore processors and CUDA code compiled with this compiler, and also the amount of tweaking required for both.

I wonder if it uses a warp size of 4 or of 32.

I also hope it’s very CPU locality aware, trying to keep threads from the same block running on the same physical CPU to improve cache coherence. That gets tricky when you’re creating and destroying new blocks all the time.