Simple PTX shader - OptiX 7

Good morning,

This may be a stupid question, so please forgive if so, but I am looking for the bare minimum code that uses PTX shader all the way up to building an executable. Something very simple that doesn’t require any outside files/libraries (e.g. Optix7/SDK/sutils). Is this available anywhere?

Thank you for any help/hints - once again I apologize if this is too simple a question or overly redundant.

Do you mean a CUDA PTX program, with no ray tracing?


Yes. I just want to get an simple example of building a CUDA PTX shader all the way to creating a executable - no GUI necessary.

You could look at the CUDA samples that come with the CUDA SDK. There are a bunch of pared down samples in the “0_sample/” folder, samples/0_simple/simplePrintf, for example. It has a single .cu file that has only two functions, a host main() and a device kernel, into an executable.

I think there are some CUDA blog posts around here that also demonstrate some dependency-free examples. The CUDA quick start guide might help.

This one looks extremely simple, just a couple of lines followed by invoking nvcc manually:

For what it’s worth, the simplest example in the OptiX SDK is the optixHello sample. It depends on OptiX, but doesn’t use sutils.


Thank you @dhart for the reply and the CUDA information. However, I am actually looking for a way to build PTX code then pass to OptiX 7 for creating an executable. Probably pretty simple, but I am still in process of learning.

Thanks again

Oh, I see, then I misunderstood your question. So the way to do that is to use nvcc to compile your OptiX programs, and then in your application, load the PTX file and call optixModuleCreateFromPTX(), giving it the string contents of the PTX file.

The nvcc compilation part you can do manually just like in that tutorial I posted above, or you can write a small makefile. Starting from scratch can be a little confusing and slow at first, so the first two things I would recommend doing are 1- step through the sutil function getPtxString() in a debugger to see what it does and learn how it works. It’s not very big, and providing an example you can inspect and customize is one reason we include that source code.

And 2- copy sutil’s getPtxString() along with all the functions it depends on out of sutil/ and into your own app, then start removing anything you don’t need and strip it down to the bare minimum. getPtxString() has two code paths, one for JIT compiling PTX on the fly using nvrtc, and one to read code from a PTX file that was compiled in advance using nvcc. You could strip out the JIT compiling, and the code that does source caching, and you’ll end up with something that basically just reads your PTX file.


1 Like

That’s exactly what I was looking for - thanks @dhart

Sorry to badger about this simple question, but is there an example of a macro build on a CMakeLists.txt that would take a specific CUDA file (e.g. and generate a PTX file that could then be used by the cmake build?

Something along the lines of following CMakeLists.txt file:

macro(myPTXmacro inputCudaFile outputPTXFile)
# not sure what would be needed here

# Call macro to build ptx file that can then be used for building the executable
myPTXmacro( PTX_FILE)
add_executable(xShader ${PTX_FILE} driver.cpp)
target_include_libraries(xShader PUBLIC ${OPTIX_INCLUDE})
target_link_libraries(xShader use_cuda)

Thanks again for any help.

You can find an example of a Cmake macro to build PTX in the OptiX SDK’s Cmake files, specifically look at the file Cmake/Macros


My OptiX 7 application frameworks are not using SDK sutils functions and the CMake scripts compiling *.cu files to *.ptx are getting an explicit list of *.cu files plus dependencies and compile options.
That is easier to strip down to the bare minimum than the CMake methods used inside the OptiX SDK.

Also the naming scheme is gets translated to name.ptx and placed in a folder relative to the application.
No bloated *.ptx filenames or hardcoded paths like in the SDK.

Have a look at these places:

Generating custom build rules for *.cu to *.ptx translation:
using this *.cmake function:
Reading a *.ptx source into a string:
Building OptixModules from them:

More condensed code in the later examples, building all OptixProgramGroup elements with a single call:

Thank you @dhart and @droettger.

Quick question @droettger, if I may, when I run cmake with the NVCUDA_COMPILE_PTX I get the following error:

Unknown CMake command “NVCUDA_COMPILE_PTX”.

This is probably simple, but is there a script I am missing? If so, is there a way to put that script in the current directory (where the cmake is being executed)? I would like to keep the work as isolated as possible.

Thank you again for all the help.

Yes, that *.cmake file needs to be included in some CMakeLists.txt before you can use it.

In my examples that happens inside the top-most CMakeLists.txt which configures the overall CMake build environment
and that it finds that nvcuda_compile_ptx.cmake file inside my local 3rdparty/CMake folder at all, is due to setting the CMAKE_MODULE_PATH to that folder one instruction above that.

1 Like

Cool, it works - Thank you very much @droettger
I still have a lot to learn, not just about OptiX but apparently CMAKE

Hi @droettger
Thank you again for the assist with nvcuda_compile_ptx.cmake - I put it in my local CMake directory and called it to build a PTX file from CUDA code perfectly. However, when I tried to call optixTrace(...) in that CUDA code I get ptx compile error along the lines of the following:
ptaxs C:/Users/.../..._triangle_compute_35.ptx, Line 26; error: Label expected for argument 0

I have a triangle.h file with the following struct defined:

// OptiX 7
#include <optix.h>
#include <cuda_runtime.h>
// Vector math
#include “vec_math.h”
struct Params {
uchar4* image;
unsigned int image_width;
unsigned int image_height;
float3 cam_eye;
float3 cam_u, cam_v, cam_w;
OptixTraversableHandle handle;

The CUDA code that is being converted to PTX follows:

#include “triangle.h”

extern “C” {
__constant__ Params params;

static __forceinline__ __device__ void computeRay(uint3 idx, uint3 dim,
float3& origin, float3& direction) {
const float3 U = params.cam_u;
const float3 V = params.cam_v;
const float3 W = params.cam_w;
const float2 d = 2.0f * make_float2(
static_cast(idx.x) / static_cast(dim.x),
static_cast(idx.y) / static_cast(dim.y)) - 1.0f;

origin = params.cam_eye;
direction = normalize(d.x * U + d.y * V + W);

extern “C” __global__ void __raygen__rg() {
const uint3 idx = optixGetLaunchIndex();
const uint3 dim = optixGetLaunchDimensions();

float3 ray_origin, ray_direction;
computeRay(idx, dim, ray_origin, ray_direction);

unsigned int p0, p1, p2;

// Problem occurs when I add the following call:
optixTrace(params.handle, ray_origin, ray_direction,
0.0f, 1e16f, 0.0f, OptixVisibilityMask(255),
OPTIX_RAY_FLAG_NONE, 0, 1, 0, p0, p1, p2);

} // __raygen_rg(…)

The CMakeLists.txt file I am using follows:


# Set the PTX directory base

# Module path to cmake nvcuda_compile_ptx.cmake that is employed
# to build PTX code from CUDA

# Use NVCUDA_COMPILE_PTX function to produce the desired custom
# rule and output filenames when compiling OptiX from *.cu to
# *.ptx


# Compile CUDA source code to PTX shader(s)
TARGET_PATH “{PTX_TARGET_DIR}/haney_ptx" GENERATED_FILES PTX_SOURCES NVCC_OPTIONS "--gpu-architecture=compute_50" "--relocatable-device-code=true" "--Wno-deprecated-gpu-targets" "-I{OPTIX7_INCLUDE_DIR}”

#message("PTX_SOURCES = " “${PTX_SOURCES}”)

include_directories("." ${OPTIX7_INCLUDE_DIR})

add_executable(xOptixShader {SOURCES} {SHADERS} {SHADERS_HEADERS} {PTX_SOURCES})


I am hoping that I am doing something simple, but am kind of lost as to why I am getting the error when I use the optixTrace(...). Any help would be great.

Thanks again.