I need dynamic runtime compilation for my project, but Cuda requires Visual Studio to compile anything on Windows and OpenCL has proven to be too buggy and unstable. DirectCompute isn’t portable. So… I need an alternative.
In theory, I can use the Clang/LLVM toolchain to target CUDA directly, and looking at the source I see the NVPTX target in the LLVM sourcetree and CodeGenCUDA in the clang sourcetree.
Thus my question: How do I configure Clang to compile code to a valid PTX kernel? Does it support things like cuda intrinsics or things like unified memory, cudaMemcopyToSymbol or texture support? What are the current limitations?
Any information would be quite helpful, since the only things I can find are a few mentions in various blogs, mailing lists, and power point presentations.