context->compile() doesn't return (unresolved with workaround, win7 64 and linux tested)

I have a very strange error, the context->compile function doesn’t return and runs forever. But this happens only if I call a very simple function in the camera program. If I remove the call to the function, it runs fine.

My System:
GeForce GTX 550 Ti, AMD quad core, 12 GiB of ram
Windows 7 64 bit
Before I had VS 2010, CUDA 5.0 and OptiX 3.0, later I updated to VS 2012, CUDA 5.5 and OptiX 3.5 but it didn’t solve the problem. neither it helped switching from 32bit to 64bit. I’m using sm_20 (also tried sm_21 and sm_13), because i want to use atomicAdd(float*, float) later.

I tried to create a minimal example, but when using the same code in sample2 from the OptiX samples it worked. So here is my project (based on sutil), stripped down quite a lot.

I’ll attach the camera program, the full code is here:
(note that while building you’ll probably have to adjust the OptiX install directory with CMake. I also had to copy paste the OptiX DLLs into the build dir, next to the .exe on windows)

__device__ __inline__ float filterTent___(float var) {
    if(var < 0.5f) {
        return var + 1.f;
    else {
        return 1.f - var;

__device__ __inline__ float filterValue___(const optix::float2& rayOrigin, const optix::uint2& pixel) {
    optix::float2 relativePosition = rayOrigin - optix::make_float2(pixel) + optix::make_float2(0.5f);

    float hx = filterTent___(relativePosition.x);
    float hy = filterTent___(relativePosition.y);
    return hx * hy;

RT_PROGRAM void pathtrace_camera()

    uint2 screenPos = make_uint2(launch_index.x, launch_index.y);
    float2 screenRayPos = make_float2(screenPos) + make_float2(0.5f);

    size_t2 screen = output_buffer.size();

    float2 inv_screen = 1.0f/make_float2(screen);
    float2 pixel = (make_float2(screenPos)+0.5f) * 2.f * inv_screen - 1.f;

unsigned int seed = tea<16>(screen.x*screenPos.y+screenPos.x, frameNumber);
    float2 jitter = make_float2(rnd(seed), rnd(seed));
    jitter -= make_float2(0.5f, 0.5f);
    screenRayPos += jitter;

float2 d = pixel + jitter * 2.f * inv_screen;   // device space [-1, 1; -1, 1], same scaling as with screen space
    float3 ray_origin = eye;
    float3 ray_direction = normalize(d.x*U + d.y*V + W);

    PerRayData_pathtrace prd;
    prd.result = make_float3(0.f);

    Ray ray = make_Ray(ray_origin, ray_direction, pathtrace_ray_type, scene_epsilon, RT_DEFAULT_MAX);
    rtTrace(top_object, ray, prd);  //alternatively, it also works, when removing this line (see below)

    float filter = 1.f;
    filter = filterValue___(screenRayPos, screenPos); // program compiles when removing (commenting out) this line

    float3 eyeImage = prd.result * filter;
    output_buffer[screenPos] = make_float4(eyeImage, filter);


btw: posting this question the second time, because the postings from my old account are not visible and is not answering.

yey, forum support unlocked my account. it took just one week :P

and I just managed to reproduce both problems on linux (kernel 3.11.10-7, 64bit, OpenSuse 13.1, driver 331.49, same cuda and optiX versions)

oh, and btw, I managed to run into the same problem of not returning with a totally different code :) can anybody reproduce at least?

why is this stuff always happening to me? :P

i was trying to minimise the example and now it is compiling without the sm20 flag, but not if it is set (only linux tested). I don’t know yet what i changed to make it work.
edit: updated with forceinline:

anyway, my actual code is still not compiling and i have really no idea what to do.

oh, and there was somebody with the same problem, but unfortunately no solution:

In the meanwhile I was able to compile my actual code by commenting out some parts and using callable programs for others (instead of device inline). But I’m still wondering, why the example code doesn’t work and why there is no error message.

For now I don’t consider OptiX deterministic, also for some other strange behaviour, at least not on my computer :P

As mentioned in this thread, using RT_CALLABLE_PROGRAM without rtCallableProgram results in undefined behaviour. It was proposed to replace RT_CALLABLE_PROGRAM with device forceinline.

I now verified to following:

  • works if using RT_CALLABLE_PROGRAM, however only replacing the keywords is not supported by OptiX and the behaviour is undefined (see this thread).
  • my actual code base stops to compile, when NOT using RT_CALLABLE_PROGRAM and replacing all calls with device forceinline
  • my actual code base also works, when using RT_CALLABLE_PROGRAM and the proper method of rtCallableProgram, this is what I started to use just now.

This behaviour is really annoying. It keeps popping up now and then, when doing seemingly innocent code changes.

Code that ran fine on my linux machine with above mentioned specs didn’t run on an university workstation with the following spec:
Win7 64bit, vs12 compiler, nvidia driver 332.76, cuda 5.0 and Optix 3.0. intel xeon quad core and quadro 2000 graphics. I tried to apply the callable program workaround but it didn’t work. after those callable program changes it also stopped to compile on my computer.

The quadro 2000 graphics card was much slower with code that did compile. I have no idea if there is a connection between the graphics power and this issue though. the card is newer and supports more features (sm_30 iirc) but it is way slower, at least in with my program.

I just found another workaround for the above problem. In one case it helped to put the following every now and then into the code:



On a side note, in my opinion you should also work a bit on communication :)
it concerns both, how the compiler/optix responds to errors and how the moderators / developers respond to questions.
ad 1. I got so often launch errors 700 or 999 without any more message. even not after enabling exceptions and other stuff. It’s really annoying when something stops working and you have no clue what and why, because there are no messages. This is so bad, that I probably won’t consider another time to use OptiX. The time one can win by relying on a working tracer is lost by debugging.

ad 2. I put quite a lot of effort to provide minimal examples that could be used for debugging. this thread features one, and the thread about the rtPrintf crash another. For neither of them I got even an answer. Moreover there seem to be issues that are known, but haven’t been resolved for at least two years. Finally there is none, or I couldn’t find any blog, any twitter, or anything else informing about the roadmap, cool projects etc. It looks a bit like a dead project.