Are there any plans to support running CUDA programms on Multi-Core CPUs? I heard about it a year ago or so. Or does anybody have experience with MCUDA (http://impact.crhc.illinois.edu/publications.php)?
Background:
I’ll start with my diploma thesis next month. Aim is to extend a low-level streaming image processing library/framework to support GPUs and Multicore-CPUs. So it would be very cool to write the filters only once for CUDA and get the CPU-Version for free.
Yes, OpenCL looks very interesting. The problem that I see, is that there is no implementation for GPUs or even CPUs out there right now. Or does anyone has experience with the Nvidia implementation?
Maybe the best way is to use CUDA first with OpenCL in mind and port the library to OpenCL when implementations are available!?
Nobody knows and NVIDIA is not saying anything about it. I recently talked to a student who worked on this at NVIDIA as an intern and he doesn’t even have any up to date news. We know it’s not in CUDA 2.2, that is about the most concrete thing we do know.
Device emulation will work reasonably fast as any other CPU implementation (provided the thread spawning overhead is NOT comparable to thread-execution time). That is not the main problem.
Device emulation can produce INCORRECT results. It is not a full emulation of hardware. So, be aware. It should be used as a debugging tool and nothing more.
Have you ever tried device emulation? It is an order of magnitude or two slower than the simplest unoptimized C code loop to do the same algorithm. Device emulation is just for debugging.