This may be a stupid question, so please forgive if so, but I am looking for the bare minimum code that uses PTX shader all the way up to building an executable. Something very simple that doesn’t require any outside files/libraries (e.g. Optix7/SDK/sutils). Is this available anywhere?
Thank you for any help/hints - once again I apologize if this is too simple a question or overly redundant.
You could look at the CUDA samples that come with the CUDA SDK. There are a bunch of pared down samples in the “0_sample/” folder, samples/0_simple/simplePrintf, for example. It has a single .cu file that has only two functions, a host main() and a device kernel, into an executable. CUDA Samples :: CUDA Toolkit Documentation
I think there are some CUDA blog posts around here that also demonstrate some dependency-free examples. The CUDA quick start guide might help. Quick Start Guide :: CUDA Toolkit Documentation
Thank you @dhart for the reply and the CUDA information. However, I am actually looking for a way to build PTX code then pass to OptiX 7 for creating an executable. Probably pretty simple, but I am still in process of learning.
Oh, I see, then I misunderstood your question. So the way to do that is to use nvcc to compile your OptiX programs, and then in your application, load the PTX file and call optixModuleCreateFromPTX(), giving it the string contents of the PTX file.
The nvcc compilation part you can do manually just like in that tutorial I posted above, or you can write a small makefile. Starting from scratch can be a little confusing and slow at first, so the first two things I would recommend doing are 1- step through the sutil function getPtxString() in a debugger to see what it does and learn how it works. It’s not very big, and providing an example you can inspect and customize is one reason we include that source code.
And 2- copy sutil’s getPtxString() along with all the functions it depends on out of sutil/ and into your own app, then start removing anything you don’t need and strip it down to the bare minimum. getPtxString() has two code paths, one for JIT compiling PTX on the fly using nvrtc, and one to read code from a PTX file that was compiled in advance using nvcc. You could strip out the JIT compiling, and the code that does source caching, and you’ll end up with something that basically just reads your PTX file.
Sorry to badger about this simple question, but is there an example of a macro build on a CMakeLists.txt that would take a specific CUDA file (e.g. shader.cu) and generate a PTX file that could then be used by the cmake build?
Something along the lines of following CMakeLists.txt file:
macro(myPTXmacro inputCudaFile outputPTXFile)
# not sure what would be needed here
endmacro()
# Call macro to build ptx file that can then be used for building the executable
myPTXmacro(shader.cu PTX_FILE)
add_executable(xShader ${PTX_FILE} driver.cpp)
target_include_libraries(xShader PUBLIC ${OPTIX_INCLUDE})
target_link_libraries(xShader use_cuda)
My OptiX 7 application frameworks are not using SDK sutils functions and the CMake scripts compiling *.cu files to *.ptx are getting an explicit list of *.cu files plus dependencies and compile options.
That is easier to strip down to the bare minimum than the CMake methods used inside the OptiX SDK.
Also the naming scheme is name.cu gets translated to name.ptx and placed in a folder relative to the application.
No bloated *.ptx filenames or hardcoded paths like in the SDK.
Quick question @droettger, if I may, when I run cmake with the NVCUDA_COMPILE_PTX I get the following error:
Unknown CMake command “NVCUDA_COMPILE_PTX”.
This is probably simple, but is there a script I am missing? If so, is there a way to put that script in the current directory (where the cmake is being executed)? I would like to keep the work as isolated as possible.
Yes, that *.cmake file needs to be included in some CMakeLists.txt before you can use it.
In my examples that happens inside the top-most CMakeLists.txt which configures the overall CMake build environment https://github.com/NVIDIA/OptiX_Apps/blob/master/CMakeLists.txt#L12
and that it finds that nvcuda_compile_ptx.cmake file inside my local 3rdparty/CMake folder at all, is due to setting the CMAKE_MODULE_PATH to that folder one instruction above that.
Hi @droettger
Thank you again for the assist with nvcuda_compile_ptx.cmake - I put it in my local CMake directory and called it to build a PTX file from CUDA code perfectly. However, when I tried to call optixTrace(...) in that CUDA code I get ptx compile error along the lines of the following: ptaxs C:/Users/.../..._triangle_compute_35.ptx, Line 26; error: Label expected for argument 0
I have a triangle.h file with the following struct defined:
// OptiX 7 include <optix.h>
// CUDA include <cuda_runtime.h>
// Vector math include “vec_math.h”
struct Params {
uchar4* image;
unsigned int image_width;
unsigned int image_height;
float3 cam_eye;
float3 cam_u, cam_v, cam_w;
OptixTraversableHandle handle;
};
The triangle.cu CUDA code that is being converted to PTX follows:
// Problem occurs when I add the following call: optixTrace(params.handle, ray_origin, ray_direction,
0.0f, 1e16f, 0.0f, OptixVisibilityMask(255),
OPTIX_RAY_FLAG_NONE, 0, 1, 0, p0, p1, p2);
} // __raygen_rg(…)
The CMakeLists.txt file I am using follows:
if(CUDA_FOUND AND OPTIX7_FOUND)
# Set the PTX directory base
set(PTX_TARGET_DIR “${CMAKE_RUNTIME_OUTPUT_DIRECTORY}”)
# Module path to cmake nvcuda_compile_ptx.cmake that is employed # to build PTX code from CUDA
set(CMAKE_MODULE_PATH “${CMAKE_CURRENT_SOURCE_DIR}/CMake”)
# Use NVCUDA_COMPILE_PTX function to produce the desired custom # rule and output filenames when compiling OptiX from *.cu to # *.ptx
include(“nvcuda_compile_ptx”)
I am hoping that I am doing something simple, but am kind of lost as to why I am getting the error when I use the optixTrace(...). Any help would be great.
You’re saying you get an error from ptxas?
The PTX assembler is not involved in the compilation step from OptiX *.cu to *.ptx source files.
If this is happening during compilation of your project that means you setup your build process incorrectly.
1.) When using the FindOptiX7.cmake from my examples, those let CMake generate the “FOUND” variable and that matches the spelling of the FindOptiX7.cmake, otherwise newer CMake versions throw a warning. Means it should be OptiX7_FOUND when you didn’t declare that yourself.
2.) OPTIX7_INCLUDE_DIR inside the NVCUDA_COMPILE_PTX() call is missing the $ sign which means none of the OptiX device functions should have worked, which is effectively what you’re reporting.
That is true for all other bold variables above. I have no idea if that even generated anything valid inside the solution.
3.) You’re not using "--use_fast_math" in the NVCC_OPTIONS which will make your device code runsa lot slower, if at all . Please always prefer PTX compiled with --use_fast_math.
4.) In all OptiX device structs I would recommend to order the individual fields by their CUDA alignment restrictions to not have any compiler induced padding inside the structure.
Means both the uchar pointer and the OptixTraversableHandle are 64-bit values which are 8-byte aligned and I would place them at the beginning since all remaining float3 and unsigned int fields have an alignment of 4-bytes.
Compare to this: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/shaders/system_parameter.h#L38
Please always specify your development environment when asking about any OptiX issues:
OS version, installed GPU(s), display driver version, OptiX version (major.minor.micro), CUDA toolkit version, host compiler version.
The missing ‘$’ in my post must be an artifact of a copy-paste mistake because in my actual CMakeLists.txt file they exist. Thank you for the information on byte alignment in my struct(s) - very useful as well as the --use_fast_math NVCC_OPTIONS. The ‘OPTIX7_FOUND’ and associated ‘OPTIX7_INCLUDE_DIR’ are from a CMake FindOptix7 script that is custom for our code (modeled after your FindOptix7 cmake script) and it works for other OptiX7 code I wrote - granted very simple (just initializes Optix7 and exits).
I apologize for not listing the OS version, installed GPU(s), etc. - it was the end of a long day before the holiday. The OS version I am using is Windows 10 version 1809 (OS build 17763.1637), CUDA 11.1 (cuda_11.1.relgpu_drvr455TC455_06.29190527_0), Nvidia card: GeForce MX130, Nvidia Driver version: 456.81, CUDA compilation tools: release 11.1, V11.1.105, using OptiX 7.0.0
Still having same problem(s) when I try and include optixTrace(...) function call in triangle.cu, but everything compiles properly when I don’t include optixTrace(...) function call. Apparently this function call from OptiX 7.0.0 is not being found ? So still confused as to why this would be.
If you compile the OptiX SDK 7.x examples or my OptiX 7 applications on your development environment, do they work?
If yes, there is some setup error in your project.
The recommendation would then be to start from a working example and change it to your needs.
Again, the compilation from OptiX device *.cu source files to *.ptx source files does not invoke the PTX assembler at all.
There cannot be any ptxas error in that step unless something went seriously wrong with your project setup.
Means you should be able to call these explicitly on a standard command prompt window (from inside the CUDA bin directory if tthe PATH environment varibles aren’t setup correctly) and if that is working but not inside your project, then there is again something broken in your project setup which cannot be seen in code excerpts.
Hi @droettger, modifying your example code I was able to get it all to compile excepting for the following during compilation: nvcc fatal : Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures
Is there an argument I could pass to cuda_compile_and_embed to remedy this?
Thanks again.
EDIT: I think I have fixed the issue. It was a corrupt FindOptix.cmake script I was calling - my bad.
However, when I try to write the result to an image file (ppm) from the Params struct I just get a black screen (all inputs are NULL in the ppm file). Can you tell me if I am doing something dumb copying the result back from the DEVICE to HOST? The following is the code snippet where I think the problem may be where d_param is on device (as per the example list on webpage):
// Now rendered results from the launch are in params.image
Params hparams = (Params)malloc(sizeof(Params));
cudaMemcpy(hparams, reinterpret_cast<void*>(d_param), sizeof(Params),
cudaMemcpyDeviceToHost);
string fout(“myTriangle.ppm”);
fio.WritePPM(hparams->image, width, height, fout.c_str());
if(hparams) delete hparams;