Most iportant from GTC Cuda on x86 hello emulation mode

I wonder if it uses a warp size of 4 or of 32.

I also hope it’s very CPU locality aware, trying to keep threads from the same block running on the same physical CPU to improve cache coherence. That gets tricky when you’re creating and destroying new blocks all the time.

As long as the PGI compiler keeps pace with CUDA toolkit updates, it would be nice to have a target that is improving much faster than OpenCL can.

As long as the PGI compiler keeps pace with CUDA toolkit updates, it would be nice to have a target that is improving much faster than OpenCL can.

Oh, OK. Wishful thinking on my part, although the news post did say “we might soon see virtual machines running x86 CUDA code on the GPU silicon as well.”

http://www.pgroup.com/about/news.htm#42. Sure, targeting a CUDA C++ compiler to x86 makes sense. Another reason why NVIDIA dropped the emulator from the SDK like a hot potato. Unfortunately, the thing will cost a bit.

Oh, OK. Wishful thinking on my part, although the news post did say “we might soon see virtual machines running x86 CUDA code on the GPU silicon as well.”

http://www.pgroup.com/about/news.htm#42. Sure, targeting a CUDA C++ compiler to x86 makes sense. Another reason why NVIDIA dropped the emulator from the SDK like a hot potato. Unfortunately, the thing will cost a bit.

This is something I don’t get for both OpenCL and the above option. Why would I want a hybrid cluster??? I currently have 20 S1070 connected to 10 strong servers.

The CPU is mostly idle (say 70% of the time) waiting for the GPU to crunch the numbers. What else can I do with those CPUs??? Sure I can give it some of the number crunching work,

be it OpenCL or CUDA on x86, but it will only work slower than the GPU. At the end I’ll end up waiting for the CPUs to finish its calculations long after the GPUs have already finished…

I just dont see the gimik of this hybrid computation… maybe its just how my code behaves… :)

Now if nVidia could put a simple x86 on the GPU + some OS (which was also suggested somewhere recently) - and cut all those fancy Intel/AMD/whatever stuff out - that would

be great and cheaper… Actually I’ve seen a poster in GTC about using Atom computer with GPU in order to cut through the costs of a powerful server… seems doable by what

the poster suggested. ( http://www.nvidia.com/content/GTC/posters/…tomic-Tesla.pdf )

eyal

This is something I don’t get for both OpenCL and the above option. Why would I want a hybrid cluster??? I currently have 20 S1070 connected to 10 strong servers.

The CPU is mostly idle (say 70% of the time) waiting for the GPU to crunch the numbers. What else can I do with those CPUs??? Sure I can give it some of the number crunching work,

be it OpenCL or CUDA on x86, but it will only work slower than the GPU. At the end I’ll end up waiting for the CPUs to finish its calculations long after the GPUs have already finished…

I just dont see the gimik of this hybrid computation… maybe its just how my code behaves… :)

Now if nVidia could put a simple x86 on the GPU + some OS (which was also suggested somewhere recently) - and cut all those fancy Intel/AMD/whatever stuff out - that would

be great and cheaper… Actually I’ve seen a poster in GTC about using Atom computer with GPU in order to cut through the costs of a powerful server… seems doable by what

the poster suggested. ( http://www.nvidia.com/content/GTC/posters/…tomic-Tesla.pdf )

eyal

Yeah, I’ve heard a lot of “rumours” of a future setup with an ARM processor ( much more likely ) out on the discrete GPU. This could be pretty awesome if one could cut out most of the host <–> device communication for big jobs. And i guess it would be a way for Nvidia to counter fusion ?

Yeah, I’ve heard a lot of “rumours” of a future setup with an ARM processor ( much more likely ) out on the discrete GPU. This could be pretty awesome if one could cut out most of the host <–> device communication for big jobs. And i guess it would be a way for Nvidia to counter fusion ?