Using CURAND inside NVRTC (JIT-compiled) kernels

I am writing an application in Go (golang), using CUDA: since I cannot use nvcc (on Go code), I am currently using CUDA Driver API and NVRTC.

I would like to use CURAND inside my kernels, but I don’t know how to properly set-up the module and its compilation. I saw there are two relevant calls:

nvrtcCompileProgram(prog, numOptions, options)

and

nvrtcCreateProgram(prog, src, name, numHeaders, headers, includeNames)

(doc http://docs.nvidia.com/cuda/nvrtc/index.html#group__compilation)

There are two relevant header files: curand.h and curand_kernel.h

I thought to include curand_kernel.h in my kernel source file, which contains only device and global functions, and to load the header and pass it to nvrtcCreateProgram(), but NVRTC complains that curand.h is missing. If I add it as well, it complains about something else missing, and it will go on like that until I get to the point when it complains that there is code that is not allowed in JIT mode.

So, I tried to use the --include-path option in nvrtcCompileProgram, but the problem is the same: something not allowed in JIT mode, for example this is one of the many errors:

/usr/local/cuda-8.0/include/curand_mrg32k3a.h(2138): error: A namespace scope variable without memory space annotations (device/constant/shared) is considered a host variable, and host variables are not allowed in JIT mode. Consider using -default-device flag to process unannotated namespace scope variables as device variables in JIT mode

I thought that, maybe, there is another way to do what I need by calling nvcc separatedly, produce a fixed, curand-only PTX file that I would subsequently load in my program, but I am not sure about how to do that and what are the implications of mixing nvcc and jit compilation.

At this point, I don’t know how to proceed. Can someone please provide an example on how to use, if possible, CURAND inside a NVRTC-compiled kernel?

Thank you very much
~Aki