Simple PTX shader - OptiX 7

Good morning,

This may be a stupid question, so please forgive if so, but I am looking for the bare minimum code that uses PTX shader all the way up to building an executable. Something very simple that doesn’t require any outside files/libraries (e.g. Optix7/SDK/sutils). Is this available anywhere?

Thank you for any help/hints - once again I apologize if this is too simple a question or overly redundant.

Do you mean a CUDA PTX program, with no ray tracing?


Yes. I just want to get an simple example of building a CUDA PTX shader all the way to creating a executable - no GUI necessary.

You could look at the CUDA samples that come with the CUDA SDK. There are a bunch of pared down samples in the “0_sample/” folder, samples/0_simple/simplePrintf, for example. It has a single .cu file that has only two functions, a host main() and a device kernel, into an executable.

I think there are some CUDA blog posts around here that also demonstrate some dependency-free examples. The CUDA quick start guide might help.

This one looks extremely simple, just a couple of lines followed by invoking nvcc manually:

For what it’s worth, the simplest example in the OptiX SDK is the optixHello sample. It depends on OptiX, but doesn’t use sutils.


Thank you @dhart for the reply and the CUDA information. However, I am actually looking for a way to build PTX code then pass to OptiX 7 for creating an executable. Probably pretty simple, but I am still in process of learning.

Thanks again

Oh, I see, then I misunderstood your question. So the way to do that is to use nvcc to compile your OptiX programs, and then in your application, load the PTX file and call optixModuleCreateFromPTX(), giving it the string contents of the PTX file.

The nvcc compilation part you can do manually just like in that tutorial I posted above, or you can write a small makefile. Starting from scratch can be a little confusing and slow at first, so the first two things I would recommend doing are 1- step through the sutil function getPtxString() in a debugger to see what it does and learn how it works. It’s not very big, and providing an example you can inspect and customize is one reason we include that source code.

And 2- copy sutil’s getPtxString() along with all the functions it depends on out of sutil/ and into your own app, then start removing anything you don’t need and strip it down to the bare minimum. getPtxString() has two code paths, one for JIT compiling PTX on the fly using nvrtc, and one to read code from a PTX file that was compiled in advance using nvcc. You could strip out the JIT compiling, and the code that does source caching, and you’ll end up with something that basically just reads your PTX file.


1 Like

That’s exactly what I was looking for - thanks @dhart

Sorry to badger about this simple question, but is there an example of a macro build on a CMakeLists.txt that would take a specific CUDA file (e.g. and generate a PTX file that could then be used by the cmake build?

Something along the lines of following CMakeLists.txt file:

macro(myPTXmacro inputCudaFile outputPTXFile)
# not sure what would be needed here

# Call macro to build ptx file that can then be used for building the executable
myPTXmacro( PTX_FILE)
add_executable(xShader ${PTX_FILE} driver.cpp)
target_include_libraries(xShader PUBLIC ${OPTIX_INCLUDE})
target_link_libraries(xShader use_cuda)

Thanks again for any help.

You can find an example of a Cmake macro to build PTX in the OptiX SDK’s Cmake files, specifically look at the file Cmake/Macros


My OptiX 7 application frameworks are not using SDK sutils functions and the CMake scripts compiling *.cu files to *.ptx are getting an explicit list of *.cu files plus dependencies and compile options.
That is easier to strip down to the bare minimum than the CMake methods used inside the OptiX SDK.

Also the naming scheme is gets translated to name.ptx and placed in a folder relative to the application.
No bloated *.ptx filenames or hardcoded paths like in the SDK.

Have a look at these places:

Generating custom build rules for *.cu to *.ptx translation:
using this *.cmake function:
Reading a *.ptx source into a string:
Building OptixModules from them:

More condensed code in the later examples, building all OptixProgramGroup elements with a single call:

Thank you @dhart and @droettger.

Quick question @droettger, if I may, when I run cmake with the NVCUDA_COMPILE_PTX I get the following error:

Unknown CMake command “NVCUDA_COMPILE_PTX”.

This is probably simple, but is there a script I am missing? If so, is there a way to put that script in the current directory (where the cmake is being executed)? I would like to keep the work as isolated as possible.

Thank you again for all the help.

Yes, that *.cmake file needs to be included in some CMakeLists.txt before you can use it.

In my examples that happens inside the top-most CMakeLists.txt which configures the overall CMake build environment
and that it finds that nvcuda_compile_ptx.cmake file inside my local 3rdparty/CMake folder at all, is due to setting the CMAKE_MODULE_PATH to that folder one instruction above that.

1 Like

Cool, it works - Thank you very much @droettger
I still have a lot to learn, not just about OptiX but apparently CMAKE

Hi @droettger
Thank you again for the assist with nvcuda_compile_ptx.cmake - I put it in my local CMake directory and called it to build a PTX file from CUDA code perfectly. However, when I tried to call optixTrace(...) in that CUDA code I get ptx compile error along the lines of the following:
ptaxs C:/Users/.../..._triangle_compute_35.ptx, Line 26; error: Label expected for argument 0

I have a triangle.h file with the following struct defined:

// OptiX 7
#include <optix.h>
#include <cuda_runtime.h>
// Vector math
#include “vec_math.h”
struct Params {
uchar4* image;
unsigned int image_width;
unsigned int image_height;
float3 cam_eye;
float3 cam_u, cam_v, cam_w;
OptixTraversableHandle handle;

The CUDA code that is being converted to PTX follows:

#include “triangle.h”

extern “C” {
__constant__ Params params;

static __forceinline__ __device__ void computeRay(uint3 idx, uint3 dim,
float3& origin, float3& direction) {
const float3 U = params.cam_u;
const float3 V = params.cam_v;
const float3 W = params.cam_w;
const float2 d = 2.0f * make_float2(
static_cast(idx.x) / static_cast(dim.x),
static_cast(idx.y) / static_cast(dim.y)) - 1.0f;

origin = params.cam_eye;
direction = normalize(d.x * U + d.y * V + W);

extern “C” __global__ void __raygen__rg() {
const uint3 idx = optixGetLaunchIndex();
const uint3 dim = optixGetLaunchDimensions();

float3 ray_origin, ray_direction;
computeRay(idx, dim, ray_origin, ray_direction);

unsigned int p0, p1, p2;

// Problem occurs when I add the following call:
optixTrace(params.handle, ray_origin, ray_direction,
0.0f, 1e16f, 0.0f, OptixVisibilityMask(255),
OPTIX_RAY_FLAG_NONE, 0, 1, 0, p0, p1, p2);

} // __raygen_rg(…)

The CMakeLists.txt file I am using follows:


# Set the PTX directory base

# Module path to cmake nvcuda_compile_ptx.cmake that is employed
# to build PTX code from CUDA

# Use NVCUDA_COMPILE_PTX function to produce the desired custom
# rule and output filenames when compiling OptiX from *.cu to
# *.ptx


# Compile CUDA source code to PTX shader(s)
TARGET_PATH “{PTX_TARGET_DIR}/haney_ptx" GENERATED_FILES PTX_SOURCES NVCC_OPTIONS "--gpu-architecture=compute_50" "--relocatable-device-code=true" "--Wno-deprecated-gpu-targets" "-I{OPTIX7_INCLUDE_DIR}”

#message("PTX_SOURCES = " “${PTX_SOURCES}”)

include_directories("." ${OPTIX7_INCLUDE_DIR})

add_executable(xOptixShader {SOURCES} {SHADERS} {SHADERS_HEADERS} {PTX_SOURCES})


I am hoping that I am doing something simple, but am kind of lost as to why I am getting the error when I use the optixTrace(...). Any help would be great.

Thanks again.

You’re saying you get an error from ptxas?
The PTX assembler is not involved in the compilation step from OptiX *.cu to *.ptx source files.
If this is happening during compilation of your project that means you setup your build process incorrectly.

If you compare your provided CMake code with the one in
you’ll notice that you’re missing a lot of $ signs before the CMake variables.
Not sure why this generated any solution at all.

I would change all of the bold marked variables and then some more things. See the list of issues below.






NVCC_OPTIONS “–gpu-architecture=compute_50” “–relocatable-device-code=true” “–Wno-deprecated-gpu-targets” "-I{OPTIX7_INCLUDE_DIR}” “-I${CMAKE_CURRENT_SOURCE_DIR}”)




1.) When using the FindOptiX7.cmake from my examples, those let CMake generate the “FOUND” variable and that matches the spelling of the FindOptiX7.cmake, otherwise newer CMake versions throw a warning. Means it should be OptiX7_FOUND when you didn’t declare that yourself.

2.) OPTIX7_INCLUDE_DIR inside the NVCUDA_COMPILE_PTX() call is missing the $ sign which means none of the OptiX device functions should have worked, which is effectively what you’re reporting.
That is true for all other bold variables above. I have no idea if that even generated anything valid inside the solution.

3.) You’re not using "--use_fast_math" in the NVCC_OPTIONS which will make your device code runsa lot slower, if at all . Please always prefer PTX compiled with --use_fast_math.

4.) In all OptiX device structs I would recommend to order the individual fields by their CUDA alignment restrictions to not have any compiler induced padding inside the structure.
Means both the uchar pointer and the OptixTraversableHandle are 64-bit values which are 8-byte aligned and I would place them at the beginning since all remaining float3 and unsigned int fields have an alignment of 4-bytes.
Compare to this:

Please always specify your development environment when asking about any OptiX issues:
OS version, installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA toolkit version, host compiler version.

Thank you @droettger for the reply.

The missing ‘$’ in my post must be an artifact of a copy-paste mistake because in my actual CMakeLists.txt file they exist. Thank you for the information on byte alignment in my struct(s) - very useful as well as the --use_fast_math NVCC_OPTIONS. The ‘OPTIX7_FOUND’ and associated ‘OPTIX7_INCLUDE_DIR’ are from a CMake FindOptix7 script that is custom for our code (modeled after your FindOptix7 cmake script) and it works for other OptiX7 code I wrote - granted very simple (just initializes Optix7 and exits).

I apologize for not listing the OS version, installed GPU(s), etc. - it was the end of a long day before the holiday. The OS version I am using is Windows 10 version 1809 (OS build 17763.1637), CUDA 11.1 (cuda_11.1.relgpu_drvr455TC455_06.29190527_0), Nvidia card: GeForce MX130, Nvidia Driver version: 456.81, CUDA compilation tools: release 11.1, V11.1.105, using OptiX 7.0.0

Still having same problem(s) when I try and include optixTrace(...) function call in, but everything compiles properly when I don’t include optixTrace(...) function call. Apparently this function call from OptiX 7.0.0 is not being found ? So still confused as to why this would be.

What’s your host compiler version?

If you compile the OptiX SDK 7.x examples or my OptiX 7 applications on your development environment, do they work?
If yes, there is some setup error in your project.
The recommendation would then be to start from a working example and change it to your needs.

Again, the compilation from OptiX device *.cu source files to *.ptx source files does not invoke the PTX assembler at all.
There cannot be any ptxas error in that step unless something went seriously wrong with your project setup.

See this related post and follow the links in there as well:

I described here how to get the exact NVCC command lines generated inside the OptiX SDK 7 CMake environment:

It’s also prepared inside the nvcuda_compile_ptx.cmake function:
Uncomment that line and the NVCC command lines get printed into the CMake output window.

Means you should be able to call these explicitly on a standard command prompt window (from inside the CUDA bin directory if tthe PATH environment varibles aren’t setup correctly) and if that is working but not inside your project, then there is again something broken in your project setup which cannot be seen in code excerpts.

The host is MSVS 2019 version 16.8.4

I can get your example code to compile on my local machine. So I will try and backtrack by building from your example(s).

Thanks again for the assist.

Hi @droettger, modifying your example code I was able to get it all to compile excepting for the following during compilation:
nvcc fatal : Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures

Is there an argument I could pass to cuda_compile_and_embed to remedy this?

Thanks again.

EDIT: I think I have fixed the issue. It was a corrupt FindOptix.cmake script I was calling - my bad.

Thanks again for all the help @droettger.
I created the triangle OptiX 7 code found on

However, when I try to write the result to an image file (ppm) from the Params struct I just get a black screen (all inputs are NULL in the ppm file). Can you tell me if I am doing something dumb copying the result back from the DEVICE to HOST? The following is the code snippet where I think the problem may be where d_param is on device (as per the example list on webpage):

// Now rendered results from the launch are in params.image
Params hparams = (Params)malloc(sizeof(Params));
cudaMemcpy(hparams, reinterpret_cast<void*>(d_param), sizeof(Params),
string fout(“myTriangle.ppm”);
fio.WritePPM(hparams->image, width, height, fout.c_str());
if(hparams) delete hparams;

ANY help or hints would be much appreciated.