Compile optix without Cmake

EvanYL · May 10, 2022, 12:16am

I am able to build and run example code from the SDK. But I want to compile and build without using Cmake and write my own makefile. Is there an example of that? especially how to compile ptx file and link it with rest of the CUDA code.

droettger · May 10, 2022, 6:47am

Assuming you’re using Microsoft Visual Studio with the CUDA Visual Studio Integration installed.

The CUDA host code doesn’t need special handling. You would just need to set the additional include directories to your CUDA toolkit installation’s include folder and add the necessary import libraries for the CUDA runtime cudart_<version>.lib and CUDA driver API libraries cuda.lib.

For the OptiX device code *.cu to *.ptx source translation, there must be no linking happening. That would mean to assemble and link the code to cubins, but you only need PTX source code.

Instead you would need to setup the CUDA compilation options inside the CUDA Visual Studio Integration dialog to compile only from *.cu to *.ptx files. Means there needs to be some --ptx option on the NVCC command line.

I’m not aware of an example doing that for OptiX with the Microsoft Visual Studio Integration alone but it you’d should be able to get that working when setting all other NVCC compile options properly as well.

I’ve prepared a CMake message inside the scripts building the custom compile command per *.cu file here, which allows printing the actually used NVCC command line.
https://github.com/NVIDIA/OptiX_Apps/blob/master/3rdparty/CMake/nvcuda_compile_ptx.cmake#L45

Enabling that and configuring the projects in CMake prints these additional messages (with all my local names replaced with placeholders):
<your_cuda_toolkit_path>/bin/nvcc.exe --machine=64 --ptx --gpu-architecture=compute_50;--use_fast_math;--relocatable-device-code=true;--generate-line-info;-Wno-deprecated-gpu-targets;-I<your_optix_sdk_version>/include;-I<your_device_header_code_path> <your_device_source_code_path>/<source_name>.cu -o <your_target_bin_path>/$(ConfigurationName)/<source_name>.ptx

If you fill out the <your_..._path> and the ConfigurationName (Debug|Release) placeholders with the local directories on your development system, that would be basically the NVCC command line you would issue inside a command prompt to compile OptiX *.cu device source code to *.ptx source code (maybe without the semicolons). Means you could put this into a batch file and get the *.ptx that way.
(Actually that probably also needs the host compiler executable location, a path to your MSVS’ x64/cl.exe. Check the NVCC manual. That’s implicit when running inside the MSVS IDE.)

Now you would just set all CUDA Visual Studio Integration options to exactly the same settings.
Mind that there is neither the -G nor -g debug option in this command line but this would be the default for Debug targets in MSVS. You should disable the debug flags in all targets for now. OptiX device source code debugging functionality is still work in progress.

Maybe have a search for CUDA Visual Studio Integration tutorials. Though most of these will explain how to compile CUDA kernels to cubins, but you’ll get the idea.

EvanYL · May 10, 2022, 1:42pm

Thanks for the answer, I am working on a Linux system, not using CUDA Visual Studio. I will try to use the nvcc option you provide to compile the .cu to ptx first.

EvanYL · May 13, 2022, 11:14pm

I am working on a Linux system. I wrote the following make files base on(ray_tracing/makefile at master · apc-llc/ray_tracing · GitHub ) for the helloptix example and it compiled but failed when it ran. Can you tell what is wrong with the make file?

error message:
$ ./optixHello
[ 4][ KNOBS]: All knobs on default.

[ 4][ DISK CACHE]: Opened database: “/var/tmp/OptixCache_qchen/cache7.db”
[ 4][ DISK CACHE]: Cache data size: “16.0 KiB”
Caught exception: Couldn’t open source file draw_solid_color.cu

.SUFFIXES: .ptx

all: optixHello

OPTION = -O3 -std=c++11

optixHello: optixHello.o draw_solid_color.ptx
$(NVCC) -arch=sm_$(ARCH) $(filter %.o, $^) -o $@ -L$(OPTIX)/build/lib -lglfw -lglad -lsutil_7_sdk -Xlinker -rpath=$(OPTIX)/build/li
b
-L$(CUDA_SDK)/lib64 -lcurand -lnvrtc -Xlinker -rpath=$(CUDA_SDK)/lib64

draw_solid_color.ptx: draw_solid_color.cu
$(NVCC) -I$(OPTIX)/include -I$(OPTIX)/SDK $(OPTION) -arch=sm_$(ARCH) -ptx -c $< -o $@

optixHello.o: optixHello.cpp
$(NVCC) -Xcompiler “–std=c++11” -I$(OPTIX)/include -I$(OPTIX)/build -I$(OPTIX)/SDK -I$(OPTIX)/SDK/support/GLFW/deps
$(OPTION) -arch=sm_$(ARCH) -I$(OPTIX)/SDK/sutil -I$(CUDA_SDK)/include -c $< -o $@

random: optixHello
./$< random $(NUM_SPHERES) $(NUM_LIGHTS) $(WIDTH) $(HEIGHT) $(FILE_NAME)

clean:
rm -rf optixHello *.o *.ptx $(FILE_NAME)

droettger · May 16, 2022, 9:36am

(Disclaimer: I’m not using Linux myself.)

The referenced makefile looks reasonable but is for a very old OptiX version and needs adjustments.
Some of the folder names in there like the SDK-precompiled-samples don’t exist in OptiX 7 version anymore.
OptiX 7 is a header only API and none of the “optix” export libs are required anymore.
You seem to have adjusted that correctly.

Then that old OptiX version still supported 32-bit device code which is not possible since OptiX 4.0 anymore.
Means you’re missing the --machine=64 option I have listed inside the NVCC command line options.

Then you’re referencing the $(ARCH) variable which is not set inside the makefile but in the makefile.in inside the referenced repository and is set to 30 (first generation Kepler GPUs) which is not supported by current CUDA Toolkit versions.
That should be set to at least 50 to work on all GPU architectures supported by OptiX 7.

Again have a look at the command line options I listed before.

Note that all OptiX SDK examples rely on the sutil library which needs to be built as well.
Own applications don’t need to use that which would make the makefile simpler.

CURAND shouldn’t be required for optixHello.

None of that “random” stuff should be inside your makefile because that is specific to the application inside the repository you’ve copied this from.

Did you structure the source files the same way with *.cpp and *.cu files inside the same folder?

Now, running OptiX applications which load precompiled PTX source files should not report errors about not finding *.cu files unless you’re compiling them at runtime using NVRTC which the referenced makefile is not doing. But you have added some NVRTC option. From the error output, that could be the main problem.

If you single step inside the debugger through the compiled program’s debug target, you should be able to determine what code path was throwing the exception.

I’m not sure about the other NVCC invocations in that makefile but to translate from *.cu to *.ptx that should look something like this (not tried):

OPTIX=<put the folder with your local OptiX SDK 7.4.0 installation here>
# To be able to find "cuda/helpers.h" included in draw_solid_color.cu
OPTIX_SDK=$(OPTIX)/SDK
# To be able to find optixHello.h included in draw_solid_color.cu
OPTIX_HELLO=$(OPTIX_SDK)/optixHello

ARCH=50
OPT = -O3

draw_solid_color.ptx: draw_solid_color.cu
	$(NVCC) -I$(OPTIX)/include -I$(OPTIX_SDK) -I$(OPTIX_HELLO) $(OPT) -arch=sm_$(ARCH) --machine=64 --use_fast_math --relocatable-device-code=true --generate-line-info -Wno-deprecated-gpu-targets --ptx -c $< -o $@

You need to be able to build the OptiX SDK examples with CMake at least once to get the sutil library and the SDK examples with their working *.ptx code for comparisons.

Check what the output name of the *.ptx file is. The OptiX SDK example expect a specific naming scheme. Please look at the sutil::getInputData() and sutil::sampleInputFilePath() functions to see where and under which name the OptiX SDK examples expect these files.
Again this is something specific to how the OptiX SDK examples are using the sutil library which I cannot recommend in own application frameworks.

EvanYL · May 16, 2022, 10:49pm

Hi, Thanks for the answer.
I found another better example for the experiment. This one does not use any Sutil and openGL library. https://github.com/NVIDIA/rtx_compute_samples/tree/master/optixSaxpy
I am able to build and run with Cmake, and able to compile with my own make file. But got error when running it. After some debug, I found the problem is with the ptx file. If I use the ptx file compiled by Cmake and it works.

Error: message using Ptx compiled with my own make file
Optix Log[4][DISK CACHE]: ‘Opened database: “/var/tmp/OptixCache/cache7.db”’
Optix Log[4][DISK CACHE]: ’ Cache data size: “110.1 KiB”’
Optix Log[4][DISKCACHE]: ‘Cache miss for key: ptx-1824-key2b4ce8b7bc557e1673b0b65ceb67938a-sm_70-rtc0-drv465.19.01’
Optix Log[4][COMPILE FEEDBACK]: ‘’
Optix Log[4][DISKCACHE]: ‘Inserted module in cache with key: ptx-1824-key2b4ce8b7bc557e1673b0b65ceb67938a-sm_70-rtc0-drv465.19.01’
Optix Log[4][COMPILE FEEDBACK]: 'Info: Module uses 0 payload values.Info: Module uses 0 attribute values. Pipeline configuration: 2 (default).
Info: Entry function “__raygen__saxpy” with semantic type RAYGEN has 0 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 1 basic block(s), 19 instruction(s)
Info: 0 non-entry function(s) have 0 basic block(s), 0 instruction(s)
’
Optix Log[3][PIPELINE CREATE]: ‘params variable “params” not found in any module. It might have been optimized away.’
Optix Log[4][COMPILE FEEDBACK]: ‘Info: Pipeline has 1 module(s), 1 entry function(s), 0 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 1 basic block(s) in entry functions, 19 instruction(s) in entry functions, 0 non-entry function(s), 0 basic block(s) in non-entry functions, 0 instruction(s) in non-entry functions
’
optixSaxpy.cpp:225 CUDA Error: ‘an illegal memory access was encountered’
Max error: 2
Optix Log[2][ERROR]: ‘Error synching on OptixPipeline event (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error destroying OptixPipeline event (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)’
optixSaxpy.cpp:234 Optix Error: ‘Invalid value’
Optix Log[4][DISK CACHE]: ‘Closed database: “/var/tmp/OptixCache/cache7.db”’
Optix Log[4][DISK CACHE]: ’ Cache data size: “121.5 KiB”’
Optix Log[2][ERROR]: ‘Failed to destroy launch resources (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)’
optixSaxpy.cpp:239 Optix Error: ‘Invalid device context’
optixSaxpy.cpp:241 CUDA Error: ‘an illegal memory access was encountered’
optixSaxpy.cpp:242 CUDA Error: ‘an illegal memory access was encountered’
optixSaxpy.cpp:243 CUDA Error: ‘an illegal memory access was encountered’
optixSaxpy.cpp:244 CUDA Error: ‘an illegal memory access was encountered’

Then I compare two ptx file and found following differences:
working one:
call (%r1), _optix_get_launch_index_x, ();
ld.const.f32 %f1, [params+4];
ld.const.u64 %rd1, [params+8];
cvta.to.global.u64 %rd2, %rd1;
mul.wide.u32 %rd3, %r1, 4;
not working:
call (%r1), _optix_get_launch_index_x, ();
$L__tmp0:
.loc 1 44 3
ld.const.f32 %f1, [__nv_static_55__42_tmpxft_00021c6b_00000000_7_kernels_cpp1_ii_396a1715_params+4];
ld.const.u64 %rd1, [__nv_static_55__42_tmpxft_00021c6b_00000000_7_kernels_cpp1_ii_396a1715_params+8];
cvta.to.global.u64 %rd2, %rd1;
mul.wide.u32 %rd3, %r1, 4;

If I modify the ptx file __nv_static_55__42_tmpxft_00021c6b_00000000_7_kernels_cpp1_ii_396a1715_params to just params and it works. Can you tell what might cause this difference?

I add an echo in the Cmake to print the compile command for ptx
[ 16%] Building CUDA object optixSaxpy/CMakeFiles/optixSaxpy_kernels.dir/src/kernels.ptx
/usr/local/cuda/bin/nvcc -ccbin=/opt/rh/devtoolset-6/root/usr/bin/c++ -forward-unknown-to-host-compiler -I/local/NVIDIA-OptiX-SDK-7.1.0-linux64-x86_64/include -/rtx_compute_samples/.
-I/rtx_compute_samples/optixSaxpy/./include -O3 -DNDEBUG -std=c++11
-MD -MT optixSaxpy/CMakeFiles/optixSaxpy_kernels.dir/src/kernels.ptx
-MF CMakeFiles/optixSaxpy_kernels.dir/src/kernels.ptx.d -x cu -ptx /rtx_compute_samples/optixSaxpy/src/kernels.cu -o CMakeFiles/optixSaxpy_kernels.dir/src/kernels.ptx

my make file
kernels.ptx: kernels.cu
ARCH = 70
OPTION = -O3 -DNDEBUG -std=c++11
$(NVCC) -I$(OPTIX)/include $(OPTION) --machine=64 --use_fast_math -arch=sm_$(ARCH) --relocatable-device-code=true --generate-line-info -Wno-deprecated-gpu-targets --ptx -c $< -o $@

I tried to add MD, MT
, but it also does not change the ptx outcome. I hope you tell me how can I compile like the Cmake did, or is there other dependencies I might need to pay attention to?
And one more question regarding ptx file. is there another way to wrap the ptx file or use other format, so it does not expose the assembly code like this?
Thanks and regards

droettger · May 17, 2022, 6:34am

Well, I don’t know why the original compile command is not setting the machine=64 option. Maybe that is the default in newer CUDA toolkit versions.
The other differences between the command line options are the --relocatable-device-code=true, --generate-line-info, --use_fast_math, and it’s unclear which architecture the original uses.

The --generate-line-info is responsible for the .loc 1 44 3 output which indicates a source code location and is benign.
The --relocatable-device-code=true changes how the compiler produces code. This setting is required to be able to use direct or continuation callables inside OptiX which are just functions which aren’t called inside a module and are therefore eliminated as dead code since CUDA 8.0 if you don’t set that option or the --keep-device-functions option which in turn is not available when using NVRTC for runtime compilation.
Since the kernel.cu is not using callables, it’s not strictly necessary here.
Nonetheless, that should not result in non-functioning code.

The --use_fast_math is used to get a lot faster approximations for trigonometric functions, sqrt, and reciprocals.
That is highly recommended for fast ray tracing kernels. Though these compute examples are comping from the high performance computing world where usually more precision is required. Depends on your use case.

You never mentioned the system configuration you’re using.
(OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.)
Some of that would have been covered when providing the full PTX source instead of small code excerpts.

The base streaming multiprocessor target 7.0, your ARCH setting, will only work on Volta and newer GPUs for the OptiX device code.

So what happens when you’re using the exact same command line options inside your makefile as the ones you’ve dumped from the the CMake build environment?
I see no reason why that shouldn’t result in exactly the same PTX code and that would have been the first thing I would have tested.

Note that these compute examples have a root CMakeLists.txt at the top level of the repository folder.
https://github.com/NVIDIA/rtx_compute_samples/blob/master/CMakeLists.txt
Always use the top-level CMakeLists.txt to configure all examples inside the repository.
Note that this is setting additional options, for example C++14.

EvanYL · May 17, 2022, 3:11pm

Thanks!
Turn out it is caused by --relocatable-device-code=true. Now it makes sense, since the compilation difference is related to how params are referenced in memory (__nv_static_55__42_tmpxft_00021c6b_00000000_7_kernels_cpp1_ii_396a1715_params vs params), and the error message is related to params’ location as well.

In this saxpy code, SaxpyParameters are defined in both kernl.cu and .cpp files.
struct SaxpyParameters {
int N;
float a;
float *x;
float *y;
};

If I am not using --relocatable-device-code=true in my future code development, is there a potential risk? Is this error might just relate to this Saxpy code? like you mention it is required to be able to use direct or continuation callables inside OptiX. Should I just use --keep-device-functions?

my setting:
Cuda version: 11.3
Optix: 7.1
GPU: V100
NVIDIA-SMI 465.19.01

EvanYL · May 18, 2022, 4:01pm

Thanks!
Turn out it is caused by --relocatable-device-code=true. Now it makes sense, since the compilation difference is related to how params are referenced in memory (__nv_static_55__42_tmpxft_00021c6b_00000000_7_kernels_cpp1_ii_396a1715_params vs params), and the error message is related to params’ location as well.

In this saxpy code, SaxpyParameters are defined in both kernl.cu and .cpp files.
struct SaxpyParameters {
int N;
float a;
float *x;
float *y;
};

If I am not using --relocatable-device-code=true in my future code development, is there a potential risk? Is this error might just relate to this Saxpy code? like you mention it is required to be able to use direct or continuation callables inside OptiX. Should I just use --keep-device-functions?

One more question, is there a way to compile or wrap the ptx file inside the executable? Or does the ptx file have to be outside of the executable?

my setting:
Cuda version: 11.3
Optix: 7.1
GPU: V100
NVIDIA-SMI 465.19.0

droettger · May 19, 2022, 6:32am

If you’re only using NVCC to translate your OptiX device code to PTX source, then you could also use --keep-device-functions to prevent the dead code elimination by the compiler.
It’s just that the runtime compiler NVRTC doesn’t support that option.

The OptiX Programming Guide explains this as well:
https://raytracing-docs.nvidia.com/optix7/guide/index.html#program_pipeline_creation#program-input

Still, I have been using --relocatable-device-code=true in all my OptiX examples using callable programs and have never experienced any issue with the generated code. You can see that in my OptiX 7 examples’ CMake scripts.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/CMakeLists.txt#L132
https://github.com/NVIDIA/OptiX_Apps/blob/master/3rdparty/CMake/nvcuda_compile_ptx.cmake#L50

Means either setting should produce working code. If you say that using the exact same compiler options once without and once with --relocatable-device-code=true generates working resp. non-working code, that would be unexpected.
Note that inside the project you’re citing, there is a native CUDA kernel which is compiled and linked, and there is an OptiX device code which is translated to PTX only. All options mentioned above only apply to the OptiX device code. The options for the native CUDA kernel shouldn’t be changed.

One more question, is there a way to compile or wrap the ptx file inside the executable? Or does the ptx file have to be outside of the executable?

The optixModuleCreateFromPTX function takes a const char pointer to the PTX source data and its length as arguments.
Means you just need to have that pointer in memory, that’s all. How you get it there is completely your responsibility.
The simplest thing is to load the PTX code from a file for faster turnaround times during development.
Another approach often used is the bin2c executable inside the CUDA toolkit to convert any binary file to a hardcoded constant character array variable as C source.

EvanYL · May 23, 2022, 9:19pm

Thanks, Is the PTX code device dependent? If I want my RT code can work on different GPU, do I need to generate multiple PTX, and load it accordingly?

droettger · May 24, 2022, 7:08am

Is the PTX code device dependent?

To a certain degree. The streaming multi-processor (SM) target you chose for your *.cu to *.ptx compilation will define the minimum compatible GPU since using newer SM targets may let NVCC generate instructions not available on older GPUs.

do I need to generate multiple PTX, and load it accordingly?

That is not necessary and there actually have been cases, where OptiX couldn’t handle some of the newest available instructions in SM targets from CUDA toolkits newer than the one OptiX itself was built with, so please always refer to the OptiX Release Notes for the recommended development environment.

If I want my RT code can work on different GPU

If you want to target all GPU architectures supported by OptiX 7 with one set of PTX modules, you should translate your OptiX device code to SM 5.0 which is the first Maxwell GPU generation.
List of available SM versions here:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications

Note that the SM 5.0 target is deprecated in current CUDA toolkits and will throw deprecation warnings you can suppress with the aforementioned -Wno-deprecated-gpu-targets option inside the command line I posted earlier which also uses SM 5.0 for all of the OptiX device code.

Maxwell GPUs are comparably old and slow in the meantime and if you don’t care about Maxwell support you could also use SM 6.0 as minimum target which means Pascal and newer architectures.

For compatibility of native CUDA applications using cubins and PTX source, read these chapters of the CUDA programming guide:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#binary-compatibility

Topic		Replies	Views
Why am I getting OptiX_DIR-NOTFOUND? OptiX	3	1320	January 18, 2024
How should I define OPTIX_SAMPLE_NAME and OPTIX_SAMPLE_DIR in my own optix-based app? OptiX	7	253	July 2, 2024
Problems building the various examples OptiX	23	2166	June 14, 2022
Simple PTX shader - OptiX 7 OptiX	27	4221	October 12, 2021
Ptxas error while migrating from OptiX 6.0 to 7.2 OptiX	7	1999	October 12, 2021
Problem with running OptiX 6.5 program. "invalid value for --gpu-architecture" OptiX	7	3222	October 12, 2021
CMake don't compile Cuda kernels OptiX cuda	6	3828	October 12, 2021
OptiX Error: 'Failed to load OptiX library OptiX	51	18343	June 14, 2022
Optix 7.5 memory access problem OptiX	24	2105	August 11, 2023
Building Wald's Optix 7 Turtorial [CentOS 7] OptiX	7	1746	June 14, 2022

Compile optix without Cmake

Related topics