Compile OPTIX 7.0 as .exe

Hello!
I want to ship my Optix 7 program as a standalone executable, with the enduser not having to install any dependencies. As far as I understand, the only requirements is a supported graphics card.
It seems, that C-Make creates some files in my local directory which are then missing when I try to run the exe on a different platform, than I compiled it on. (This was observed on a machine that had VS 19 installed)
The math library i am using prevents the usage of the NVRTC.

Is there a guide on what i need to package to get the .exe to run a different PC without having VS19 installed?
Right now I am just copying the “release” output of VS19 to the second windows machine, but the .exe returns an error that it needs the VCRUNTIME140_1.dll. (I disabled the USE_MSVC_RUNTIME_LIBARY_DLL in Cmake).
Am I missing something trivial here?

Thanks for your help in advance.

ensure in VS2019 in the solution on your
project->properties->ConfigurationProperties->[C/C++]->RunTimelibrary
settings page, that you specify the /MT option (/MTd for debug);
what else files did you find in your local directory there?

When shipping an application which links against Microsoft Visual Studio libraries dynamically, you need to ship the vc_redist.x64.exe installer with your application.

You can find that inside your local MSVS compiler installation or as download on Microsoft’s site:
https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads

These need to be installed once in case the target machine doesn’t have it.
The MSVS redistributables are version controlled with manifests and if there are multiple runtimes for the same compiler installed (due to service packs), Windows will pick the matching one automatically.

If you link against static libraries, these are getting linked into your application binary and there is no need to ship these.

If you search for the folder redist, you will also find these files individually. You need the x64 versions for OptiX applications.
You could also put the necessary release redistributable files next to your executable, but that’s just bloat when every program would do that.

Note that the debug versions of these are found inside that redist folder above as well and they reside inside their own folder debug_nonredist. As the name says these should NOT be redistributed. I assume that’s because they contain the information about the library implementation which is effectively licensed with the MSVS product only. You shouldn’t ship any debug programs anyway.
But you need them if you debug remotely on a secondary system, which I do all the time. In that case copy the debug versions of the required DLLs next to your debug executable on the remote machine.

With OptiX 7 applications there might be additional CUDA libraries required.

If you programmed your application with the CUDA Runtime API linked dynamically, you’d also need to ship the CUDA Runtime DLL, for example, when using the CUDA Toolkiit 10.1 that is named cudart64_101.dll.
Again, when linking against the static runtime version cudart_static.lib inside the lib folder, that wouldn’t be required either.

When using the CDUA Driver API, no additional DLL needs to be shipped. That library comes with the NVIDIA display driver, which means the end user only needs to have a display driver installed which supports the CUDA toolkit version you used for your OptiX 7 application. Newer drivers will support older CUDA vrsions as well.
(I prefer the CUDA Driver API because it’s additionally more explicit about the CUDA context management on multiple devices.)

If you use the CUDA Runtime Compiler NVRTC to compile CUDA source to PTX input for OptiX at runtime, you’d also need to ship the two DLLs (here with CUDA 10.1 names): nvrtc64_101_0.dll and nvrtc-builtins64_101.dll

You’ll find all these inside your CUDA toolkit installation’s bin folder.

I changed it to \MT but my executable still has the same problem.
Do I have to change the CMake file?

try disabling FH4 by passing -d2FH4- compiler flag and -d2:-FH4- linker flag
see: Visual Studio Feedback

if that fails also, alternatively try to use toolset v140 instead of v141

About switching the VC Runtime Library between static and dynamic via CMake, check these methods:
https://stackoverflow.com/questions/14172856/compile-with-mt-instead-of-md-using-cmake

The bottom-most method worked just fine with MSVS 2017 and CMake 3.15.2.

Thank you very much!

Adding the following to the root CMakeLists.txt solved the issue for me!
(Using VS19 and CMake 3.16.2)

set(CompilerFlags
        CMAKE_CXX_FLAGS
        CMAKE_CXX_FLAGS_DEBUG
        CMAKE_CXX_FLAGS_RELEASE
        CMAKE_C_FLAGS
        CMAKE_C_FLAGS_DEBUG
        CMAKE_C_FLAGS_RELEASE
        )
foreach(CompilerFlag ${CompilerFlags})
  string(REPLACE "/MD" "/MT" ${CompilerFlag} "${${CompilerFlag}}")
endforeach()

Unfortunately I run into the next problem as no device code is executed.
My program just finishes without entering the rg() program, i don’t get any error warnings. I added a simple print statement to the rg() which isn’t reached.
CUDA_USE_STATIC_CUDA_RUNTIME is enabled in CMake, does this statically link the cuda runtime? (It’s still the same CMakeLists that came with the OptiX 7.0 samples)
Adding cudart64_101.dll to the build folder doesn’t change the behavior.

It runs just fine on the machine I use to compile.
Using the NVCC, are the PTX files included in the exe or do I have to ship them somehow?

your application needs the content of the PTX files at run-time (when using NVCC); otherwise you need to use NVRTC

Ah, so what I thought. Where do I have to put the PTX file?

With the NVRTC I would need the .cu files?

yes, when using NVRTC you would need the .cu files (and all its include files) and if you use include files of the CUDA toolkit, then you also need an installed CUDA toolkit on the target system and a proper detection of its include path.
(same requirement if you use the CUDA Runtime API or instead the CUDA Driver API)
UPDATE: as mentioned in a post below of course you also need an installed OptiX7 SDK on the target system

the PTX file contains a text string. As you can see in the examples, they are loaded
to const char*
So technically you could store them in your application:
const char YourKernel_PTX [YourKernel_PTX_FileSizeInBytes] = { ??,??,... };
I personally do not deploy any app due to using it internally only, but
I think Detlef can give you more information about the official recommended way to do this

My program just finishes without entering the rg() program, i don’t get any error warnings. I added a simple print statement to the rg() which isn’t reached.

Wait, there is no error message in your host code if a *.ptx file wasn’t found?
The OptiX SDK throws a std::runtime_error exception when that happens, which could mean that you’re not correctly catching exceptions?
Please have a look into the OptiX 7 SDK code and search this function: getPtxStringFromFile.

Where do I have to put the PTX file?

That’s a good question, because there is actually a problem with how the OptiX SDK examples determine the location of the *.ptx and data files.
Find out more about it here: https://forums.developer.nvidia.com/t/sdk-samples-sutil-getptxstring-file-path/70963/2

Examples showing my method instead can be found here: https://github.com/NVIDIA/OptiX_Apps
Search that code for calls to readPTX.

CUDA_USE_STATIC_CUDA_RUNTIME is enabled in CMake, does this statically link the cuda runtime?
(It’s still the same CMakeLists that came with the OptiX 7.0 samples)

You can simply check which libraries your project links to inside Visual Studio. Just open your project’s properties and look at Linker->Input or Linker->Command Line
When using the CUDA Runtime API that will either contain cudart.lib or cudart_static.lib.
When using the CUDA Driver API that will link against cuda.lib which is always dynamic and loads the nvcuda64.dll from the driver repository.

If you really want to have just a single executable which contains everything, NVRTC is out of the question.
In addition to your *.cu code and headers, and the CUDA headers mentioned, that would also require the OptiX headers.
Means the target machine needs both a CUDA toolkit and the matching OptiX SDK version installed, because the license in either SDK does not allow to ship their headers individually. (Which is kind of unfortunate for OptiX 7 because the API is headers only.)

Anyway if you want to put any PTX source code into your executable binary there are normally three ways:

  1. Put the source code into string constants manually like already mentioned above.
  2. Put the source code into string constants by using the bin2c.exe tool inside the CUDA toolkit bin folder. That just creates constant arrays from any binary data you can compile into your app.
  3. Under Visual Studio put the source code into custom binary resources and load them from there. String tables won’t work because they are too small.
    Searching a little found this thread, which shows the necessary calls:
    https://stackoverflow.com/questions/9240188/how-to-load-a-custom-binary-resource-in-a-vc-static-library-as-part-of-a-dll
    I haven’t done that in long time and don’t know how to manage that with CMake. This is not portable and Windows only.

The issue with these methods is getting this automated in your project’s build. That needs to get the individual dependencies right.
Means the PTX files would need to be compiled first, then either converted to const array code or to binary resources and finally the project would needs to be built.
Or maybe have two projects for the nvcc compiler and the main project using the generated *.ptx code afterwards.
In the past I also used a simple batch script which just called nvcc for all *.cu files.

The question is if this is worth the effort. My OptiX 7 examples use a lot of other libraries for window management, OpenGL wrapping, loading images and mesh data from scene files. There is too much stuff to care about linking all that statically and if any of these is linking a Visual Studio runtime dynamically then you’re back at the beginning.
My only concern is that when packing all files necessary for a standalone executable, unpacking it into any local (unprotected) folder and running it from there just works.

I am using the OPTIX_CHECK macro from the SDK example, but I don’t get any error messages. Not sure if this is relevant but all the debug flags are set to full for the pipeline creation. I also was surprised, that I didn’t get an error. If I remember correctly we did get an error message in an earlier build, but I’m not sure if it mentioned a missing .ptx file.

When using the CUDA Runtime API that will either contain cudart.lib or cudart_static.lib.

Thanks for the advice, VS is using the cudart_static.lib so cuda should work out of the box.

From what you said, it appears that shipping the .ptx with the exe would be the easiest way to distribute the application. I will check out your method and try to implement it in a similar way.

Using the check macros themselves will not help. These also only throw exceptions.
You need to have a try-catch block around these calls, like this:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/Application.cpp#L341

That’s why I explicitly added a hard debug assert into my own check macros to hit a break point inside the debugger before the exception fires.
That makes debugging the error easier because that’s still on the call stack.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/inc/CheckMacros.h#L76

You can step over that MY_ASSERT() and still get the exception anyway.
The three simpler intro examples only print the error without using exceptions.