Optix-IR seems to fail for me with vector types (Updated with reproduction)

This function compiles fine with PTX but fails with Optix-IR. Specifically it fails on this line:

float inv_dim_x = 1.0f / (float)dim.x;

The only thing I changed is -optix-ir instead of -ptx

I am using sm_86 on cuda 13.1 and optix 9.0

It fails when I call optixModuleCreate

My raygen program that calls computeRay will end up with 0 instructions

static forceinline device void
computeRay(uint3 idx, uint3 dim, float3 &origin, float3 &direction) {
   const float3 U = params.cam_u;
   const float3 V = params.cam_v;
   const float3 W = params.cam_w;

   // Use reciprocal multiplication instead of division
   float inv_dim_x = 1.0f / (float)dim.x;
   float inv_dim_y = 1.0f / (float)dim.y;

   float u = 2.0f * (float)idx.x * inv_dim_x - 1.0f;
   float v = 2.0f * (float)idx.y * inv_dim_y - 1.0f;

   origin = params.cam_eye;

   // Compute direction
   float3 dir;
   dir.x = u * U.x + v * V.x + W.x;
   dir.y = u * U.y + v * V.y + W.y;
   dir.z = u * U.z + v * V.z + W.z;

   // Normalize
   float len = sqrtf(dir.x * dir.x + dir.y * dir.y + dir.z * dir.z);
   float inv_len = 1.0f / len;
   direction = make_float3(dir.x * inv_len, dir.y * inv_len, dir.z * inv_len);
}

EDIT:

Here is a reproduction, please check the README for build instructions

optix_ir_bug_repro.zip (8.8 KB)

You can actually toggle the reproduction step with this minimal raygen

this will not error:

static forceinline device void
computeRay(uint3 idx, uint3 dim, float3 &origin, float3 &direction) {
const float3 U = params.cam_u;
const float3 V = params.cam_v;
const float3 W = params.cam_w;

// BUG TRIGGER: These division operations cause OptiX IR JIT to fail
// Use reciprocal multiplication instead of division
float inv_dim_x = 1.0f; // / (float)dim.x;

direction = make_float3(inv_dim_x, 0.f, 0.f);
}

this will:

static forceinline device void
computeRay(uint3 idx, uint3 dim, float3 &origin, float3 &direction) {
const float3 U = params.cam_u;
const float3 V = params.cam_v;
const float3 W = params.cam_w;

// BUG TRIGGER: These division operations cause OptiX IR JIT to fail
// Use reciprocal multiplication instead of division
float inv_dim_x = 1.0f / (float)dim.x;

direction = make_float3(inv_dim_x, 0.f, 0.f);
}

build steps

mkdir build && cd build
cmake .. -G ā€œVisual Studio 17 2022ā€ -DUSE_OPTIX_IR=ON
cmake --build . --config Release
Release\optix_ir_bug_repro.exe

^^ fails

cmake .. -G ā€œVisual Studio 17 2022ā€ -DUSE_OPTIX_IR=OFF
cmake --build . --config Release
Release\optix_ir_bug_repro.exe

^^ works

my system info:

±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.44                 Driver Version: 591.44         CUDA Version: 13.1     |
±----------------------------------------±-----------------------±---------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 …  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   62C    P0            752W /   60W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
±----------------------------------------------------------------------------------------+

Windows version

systeminfo | findstr /B /C:ā€œOS Nameā€ /C:ā€œOS Versionā€
OS Name:                       Microsoft Windows 11 Pro
OS Version:                    10.0.26200 N/A Build 26200

CUDA version

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Nov__7_19:25:04_Pacific_Standard_Time_2025
Cuda compilation tools, release 13.1, V13.1.80
Build cuda_13.1.r13.1/compiler.36836380_0
-- The CXX compiler identification is MSVC 19.44.35221.0


I have edited the original post to include a way to reproduce the issue!

Hello @bjorn24 , thank you for the reproducer! I can repro the issue with driver 591.44, but not with 581.80.

The error I see from your reproducer: Error: Pipeline link error (code 7251) and a Warning: Requested debug level ā€œOPTIX_COMPILE_DEBUG_LEVEL_FULLā€, but input module does not include full debug information before that.

I have tested the OptiX 9.0 SDK and the OptiX_Apps too, no issue there. It probably depends on the CMakeLists.txt.

This works for me:

nvcc -optix-ir -arch=sm_86 --use_fast_math -Iā€œC:\ProgramData\NVIDIA Corporation\OptiX SDK 9.0.0\includeā€ ..\triangle.cu

without the fast math option I get the crash.

Note that the SDK 9.0 compiles the samples with:

nvcc -optix-ir -arch=sm_50 --use_fast_math -lineinfo -Wno-deprecated-gpu-targets --use-local-env -I%sdkinclude% file.cu

1 Like

Thank you for taking a look! Yes, using -use_fast_math worked for me!

Thank you, I am now using optix-ir.

Is it a requirement to run with –use_fast_math? With the clang compiler I have found -ffast-masth to be problematic in the past. I generally avoid making those compiler optimization trade offs, but I suppose this option in nvcc is a bit different?

1 Like

It is a workaround for this particular driver version. If you don’t need high precision (this is true in many scenarios), fast math will speed up your code without noticeable numerical issues. For more details there’s the CUDA manual 5.5. Floating-Point Computation — CUDA Programming Guide , this forum and if you want to dig deep What Every Computer Scientist Should Know About Floating-Point Arithmetic

1 Like

Yes this is fine for me. I am just learning still, thank you!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.