Runtime construction and compilation of .cu files without cl.exe

Greetings!

I recently started experimenting with CUDA and the associated tools. One thing I couldn’t find in the docs (or forums) is whether it’s possible to construct and compile .cu files at runtime without having to go through nvcc.exe. I’m working on an app that allows kernel construction through a node-graph. The app will have to run on machines that probably won’t have visual studio installed. Nvcc seems to heavily rely on cl.exe (in Windows) and I’m guessing I wouldn’t be able to distribute my app along with nvcc.exe and cl.exe (and all the DLLs it needs) for the obvious legal reasons. I’m using strictly the Driver API so I thought that, given I only work on the GPU side of things, cl.exe wouldn’t be needed.

As an example in Direct3D, if I wanted to runtime-construct and compile an HLSL shader I could do that through D3DXCompileShader(). There doesn’t seem to be anything equivalent in CUDA - other than loading/compiling .cubin or .ptx files - which have to be generated by nvcc first.

Has anyone else attempted constructing source at runtime and compiling it? Does anyone know if NVidia plans to support a D3DXCompileShader() kind of function in the future?

If not, do you think it would be possible to maybe compile my computation nodes to PTX code and somehow stitch them together in a text array which I then compile with cuModuleLoadDateEx()?

Thanks!