Has anyone heard about plans by NVIDIA to support threading on the host in emu mode? The reason I ask is that although CUDA shows promise as a high-level language to extract the most performance out of their GPUs, for all of my potential applications, I have to keep the code portable across multiple architectures (Linux x86, Linux Itanium, IBM AIX/Power, etc…).
Running code in emu mode, will allow me to keep my code portable for other systems. However, it seems like it would be useful if when the nvcc compiler expands a kernel, it does it using some threading model on the host. It could use potentially use OpenMP, or just create pthreads as needed.
Has this idea been discussed at all?