About JIT: Could I call a device function just like a DLL?

Could I call a device function which is dynamicly loaded during runtime from device code?

As to the topic, I mangaed to make a path-tracing renderer using CUDA11.0, further I want to enable users to customize the propagation model or lighting model. Therefore, inspired by OptiX and D3D12, I plan to allow usrs to write their own model in form of shaders.
I succeeded to JIT CUDA code during runtime and load the code into a module. But when I use cuModuleGetFunction or cuModuleGetGlobal to get a address of my function(It is a device function and will be called back when the fixed CUDA code in my renderer needs to compute the color.), they shows they did not find. As you can see, I want to call device functions just like a DLL.

For my instance, I write a simple code:

#include <math.h>

extern "C" __device__ float LambertianFunction() {
    return 1.0f / M_PI;

extern "C" __device__ void *Lambertian = LambertianFunction;

Neither cuModuleGetFunction nor cuModuleGetGlobal could get the handle of LambertianFunction, but cuModuleGetGlobal could get the address of variable ‘Lambertian’, the the value of ‘Lambertian’ should be the address of ‘LambertianFunction’. So I managed to pass the address of the variable ‘Lambertian’ to the device code of my renderer. When it needs to compute the BRDF of the material, it makes this function calling(Note these code run in device):

 float BRDF = (*((CG::BXDFType*)isd.hitter->material.BXDF))();

The value of isd.hitter->material.BXDF was gained via cuModuleGetGlobal, it should be a second rank pointer of LambertianFunction.
Someting interesting happens, if I compile my CUDA code of renderer in debug mode(with “-G” flag), it works fine, I start CUDA debugger and trace the code there, the variable ‘BRDF’ gets correct value and all the things go as I expected. But within release mode, when the code goes here, it will be stuck there and could not continue to run.

So what’s the cause? Is it a bug of compiler? (It works under debug mode but fails under release mode). Did I do something wrong? Or what should I do can I reach my purpose?(Make something like shader in OptiX/D3D12)