"unreachable Executed!" When Calling Large Device Function

Hi All,

I struggled with the following error for a good while before figuring out a solution:

1>CudaBuild:
1> Compiling CUDA source file CalcHomography.cu…
1>
1> C:\Projects\UAV\Source\Libraries\Technology\CAGpu>“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe” -gencode=arch=compute_20,code="sm_21,compute_20" --use-local-env --cl-version 2010 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include" --keep-dir “x64\Release” -maxrregcount=0 --machine 64 --compile -Xcompiler "/EHsc /nologo /Ox /Zi /MD " -o “x64\Release\CalcHomography.cu.obj” “C:\Projects\UAV\Source\Libraries\Technology\CAGpu\CalcHomography.cu”
1> CalcHomography.cu
1> tmpxft_00000568_00000000-0_CalcHomography.cudafe1.gpu
1> tmpxft_00000568_00000000-5_CalcHomography.cudafe2.gpu
1> CalcHomography.cu
1> UNREACHABLE executed!
1>
1> This application has requested the Runtime to terminate it in an unusual way.
1> Please contact the application’s support team for more information.
1>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.2.targets(361,9): error MSB3721: The command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin\nvcc.exe” -gencode=arch=compute_20,code="sm_21,compute_20" --use-local-env --cl-version 2010 -ccbin “c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\x86_amd64” -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include” --keep-dir “x64\Release” -maxrregcount=0 --machine 64 --compile -Xcompiler “/EHsc /nologo /Ox /Zi /MD " -o “x64\Release\CalcHomography.cu.obj” “C:\Projects\UAV\Source\Libraries\Technology\CAGpu\CalcHomography.cu”” exited with code 3.
1>
1>Build FAILED.

As you can see, I’m using CUDA 4.2 (on 32-bit VS 2010 Pro, cross-compiling for x64 on a Windows 7 box). In CalcHomography.cu at one point I call a long (>200 lines) SVD device function, and commenting out the call stopped this error. I was baffled that when doing a test, making a call to the same SVD function from another, smaller calling function would compile fine. Eventually, I tried adding noinline to SVD declaration, and it worked. In other words, it seems the compiler just fails when the function (with inlining) gets too big. The weird thing is that it seems to be happening on the host pass (after the .gpu files have been generated), and I have no idea why the CPU compiler would have an issue with device-only code. Anyway, hope this helps someone else.

-Nick

This looks like some sort of internal compiler error to me. I am glad you found a workaround by disabling inlining, but since this happens with the CUDA 4.2 compiler when targetting sm_2x, it would be very helpful if you could file a bug against the compiler, attaching a self-contained repro case. Thank you for your help!