SDK suggestion: emulation/HW code coexistense deviceemu SDK compatibility

Hi,

Can you compile both emulated code, and the real code into same objects by nvcc?

That would be a nice feature very ofter, as people need to write compatibility code for cases there is no CUDA enabled HW present.

If you could just always trust that code written for CUDA will be run, (regardless of HW being present or not) it would save lot’s of work from higher layers.

If CUDA HW is present, the calls are fast, and if not you get automatically your code run with emulation stub, that calls the kernel compiled for host architecture with a resonable number of operating system threads.

If this is not supported, I wonder why? This is similar that writing code for OpenGL, which will run fast on devices with a good graphics card, and otherwise with CPU.

Thanks,

Vesa

An alpha of the CUDA to multithreaded CPU source-to-source compiler is coming along with 2.1.

Thanks for the reply.

Could you please tell me a bit more how that will work and about the schedule of 2.1 release? Just to save some work on compatibility code without CUDA HW.

I know nothing worse than writing code, which will be thrown away with the next release of CUDA.

Vesa

What was told at NVISION is that the nvcc compiler gets another switch --multithread. That will cause NVCC to generate multithreaded code that can be compiled with the host compiler into a multithreaded application.
Mentioning timeframes is not something NVIDIA employees seem to be allowed to do. But the rumours I have heard are before the end of this year.

HI,

thanks, even rumours are helpfull. What I want is that, the calls are automatically dispatched to the either software or HW without me having to worry about it as a developer.

Compiling with --multithread does not sound like that (wrong term IMHO). I would use a term like --sw-compatibility or something…

Vesa

Well, you will have to worry about it as a developer I am afraid. But you can make 2 shared libraries, one for execution on GPU, one for execution on multicore. Then link at runtime to the correct library after detection for CUDA.

In fact, you can write such code with current CUDA tools. NVCC doesn’t generate it automagically, but there’s nothing to prevent you from dispatching calls at application level.

I know what you mean. From my understanding, NVIDIA wants exactly the functionality you’re talking about because there’s tons of game developers who equally don’t want to worry about multiple code paths. They want simple emulation when CUDA hardware isn’t present. I think it’s a must for mass adoption. Yet the “multithread” flag that was mentioned at NVISION doesn’t sound like it fits the bill. At least it doesn’t sound like it’s automatic and simple.

Would be nice to hear what NVIDIA is thinking. Maybe a mistake in the architecture (ie, having to first querry available devices, then explicitly use one) is holding automagicallity back.

I think the flag is a major addition to CUDA. As far as I understand, multithreaded programming is not easy, at least way more difficult that programming CUDA.