context->compile() doesn't return


this account was marked as a spam bot and therefore the answer was not visible. I reposted it with added information here: and in the meanwhile the account was unlocked. so this is why you see this now.

I have a very strange error, the context->compile function doesn’t return and runs forever. But this happens only if I call a very simple function in the camera program. If I remove the call to the function, it runs fine.

My System:
GeForce GTX 550 Ti, AMD quad core, 12 GiB of ram
Windows 7 64 bit
Before I had VS 2010, CUDA 5.0 and OptiX 3.0, later I updated to VS 2012, CUDA 5.5 and OptiX 3.5 but it didn’t solve the problem. neither it helped switching from 32bit to 64bit. I’m using sm_20 (also tried sm_21 and sm_13), because i want to use atomicAdd(float*, float) later.

I tried to create a minimal example, but when using the same code in sample2 from the OptiX samples it worked. So here is my project (based on sutil), stripped down quite a lot.

I’ll attach the camera program, the full code is here:
(note that while building you’ll probably have to adjust the OptiX install directory with CMake. I also had to copy paste the OptiX DLLs into the build dir, next to the .exe on windows)

__device__ __inline__ float filterTent___(float var) {
    if(var < 0.5f) {
        return var + 1.f;
    else {
        return 1.f - var;

__device__ __inline__ float filterValue___(const optix::float2& rayOrigin, const optix::uint2& pixel) {
    optix::float2 relativePosition = rayOrigin - optix::make_float2(pixel) + optix::make_float2(0.5f);

    float hx = filterTent___(relativePosition.x);
    float hy = filterTent___(relativePosition.y);
    return hx * hy;

RT_PROGRAM void pathtrace_camera()

    uint2 screenPos = make_uint2(launch_index.x, launch_index.y);
    float2 screenRayPos = make_float2(screenPos) + make_float2(0.5f);

    size_t2 screen = output_buffer.size();

    float2 inv_screen = 1.0f/make_float2(screen);
    float2 pixel = (make_float2(screenPos)+0.5f) * 2.f * inv_screen - 1.f;

unsigned int seed = tea<16>(screen.x*screenPos.y+screenPos.x, frameNumber);
    float2 jitter = make_float2(rnd(seed), rnd(seed));
    jitter -= make_float2(0.5f, 0.5f);
    screenRayPos += jitter;

float2 d = pixel + jitter * 2.f * inv_screen;   // device space [-1, 1; -1, 1], same scaling as with screen space
    float3 ray_origin = eye;
    float3 ray_direction = normalize(d.x*U + d.y*V + W);

    PerRayData_pathtrace prd;
    prd.result = make_float3(0.f);

    Ray ray = make_Ray(ray_origin, ray_direction, pathtrace_ray_type, scene_epsilon, RT_DEFAULT_MAX);
    rtTrace(top_object, ray, prd);  //alternatively, it also works, when removing this line (see below)

    float filter = 1.f;
    filter = filterValue___(screenRayPos, screenPos); // program compiles when removing (commenting out) this line

    float3 eyeImage = prd.result * filter;
    output_buffer[screenPos] = make_float4(eyeImage, filter);