No, there is currently no straightforward way to convert an entire CUDA program into a form that will compile with a standard C compiler and run with anything resembling decent performance. (Device emulation sort of does what you are asking, but it still requires nvcc as the frontend. The resulting program has horrible CPU performance, though.)
Doing this conversion in such a way that the resulting program has good performance (though not as fast as CUDA, of course) is actually an ongoing research topic. There are a few papers that discuss some techniques, but there is no converter that can be used right now.
Are you trying to provide a fallback option for people who don’t have a CUDA device?