I’m running on a Fedora 35 Linux system with driver version 495.29.05, CUDA 11.5 and Optix 7.4
I’m trying to debug my Optix program which apparently runs fine when I create meshes with only a few triangles. I generated another mesh which has 122,000 triangles, and when I run my program it crashes. If I run it with cuda-gdb it stops with an error
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fffce48ea70
Thread 4 "GPUThread" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 301, block (2619,0,0), thread (64,0,0), device 0, sm 0, warp 4, lane 0]
I have been unable to get any kind of line number information or look at veriables even though when I compile my Optix kernel to PTX I specify the -g and -lineinfo options, specify compileOptions.optLevel = OPTIX_COMPILE_OPTIMIZATION_LEVEL_0 and compileOptions.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_FULL when calling optixModuleCreateFromPTX, and linkOptions.debugLevel = OPTIX_COMPILE_LEVEL_FULL when calling optixPipelineCreate.
If I run my program with Compute Sanitzer, specifying no options so it defaults to memory checking I get several errors complaining about CUDA_ERROR_ILLEGAL_ACCESS in CUDA calls to cuStreamSybchronize and cuEventRecord in the second iteration of my main loop which completely regenerates all Optix structures each iteration.
I suspect I’ve done something to seriously clobber device memory, probably with a cudaMemcpy related to copying my vertex or texture objects, but in the absence of any useful debug info to point to the failure, I have no idea what went wrong.
Since this is complaining about a Warm illegal address error I’m pretty sure the error is somewhere inside my Optix device code. Any suggestions how I get debug info?