rtBuffer - indexing

Good morning,

I have some OptiX 5/6 code that employs indexing into a rtBuffer declared at top of a given shader. I know that OptiX 7 no longer employs the rtBuffer macro, so can anyone point me to where the definitions exist that index the rtBuffer in OptiX 5/6? For example, given a PTX shader below:

rtBuffer<float4, 2> rayBuffer;
rtDeclareVariable(uint2, launchIndex, rtLaunchIndex, );
...
RT_PROGRAM void genRays() {
  ...
  // Do some stuff
  ...
  // How does rtBuffer compute index below?
  rayBuffer[launchIndex] = make_float3(0.0f, 0.0f, 0.0f);
  ...
}

I’m mainly just looking for how OptiX 5/6 computes an index so that I can mimic something similar in OptiX 7 - if that is possible.

I apologize ahead of time for the simplicity of the question, but am still trying to learn.

Thank you kindly for any help.

In OptiX 7 “buffers” are just 64-bit CUdeviceptr to linear memory.
You’re responsible for the allocation, alignment, and addressing.

Then all semantic variables (e.g.like above rtLaunchIndex) are replaced by OptiX 7 device functions.

For your given example of a tightly packed float4 buffer indexed by a 2D launch index that would look like this:

With this pointer declared inside the launch parameter block:

float4* outputBuffer;

This would be the code to write to it when the buffer dimension matches the launch dimensions:

const uint3 theLaunchDim   = optixGetLaunchDimensions();
const uint3 theLaunchIndex = optixGetLaunchIndex();
const unsigned int linearIndex = theLaunchDim.x * theLaunchIndex.y + theLaunchIndex.x; // width * y + x
sysData.outputBuffer[linearIndex] = make_float4(0.0f);

You could also define it as CUdeviceptr which is just a 64-bit unsigned value and then reinterpret it to the pointer you need, in case your renderer supports different output formats.
Shown here: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/shaders/raygeneration.cu#L230

(Note that your code wouldn’t compile because you assigned a float3 to a float4.)

Reading through the ray generation program of the simplest OptiX SDK example optixHello, which just fills the output buffer with a constant color, would have answered this. More due diligence please.

Thanks @droettger for the reply.

I had the computation of uint3 launch index and launch dimension to linear index from optixHello example, but was just wondering what OptiX 5/6 did when calculating different dimensions. I suppose the answer is actually not much different though.

The CUdeviceptr information is certainly an interesting approach though.

Thank you again.

That is not really a concern in OptiX versions before 7 for the device code, because you simply have matching buffer and index dimensions. Means a 3D buffer is indexed with a 3D index.
The assumption is that the data is tightly packed, means no bigger stride between elements, no row padding.
That becomes apparent when mapping the buffer.

Good to know.

Quick question about OptiX standards. Is it common to separate out the different shader programs into different files? For example, the raygen program is a file called raygen.cu and the closesthit program in another file called closesthit.cu?

Thanks again.

You can handle that as you like.
For big projects it might get awkward having everything in one file.
Also compile times might become an issue.

Thanks again @droettger. I agree, big project(s) code could become unwieldly if it was all in a single CUDA file. Compile time might become an issue later though, for now I am happy to get some working code.

All my examples are using separate modules.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L548
You probably noticed that because you’ve edited the question while I was answering. :-)

Yes, apologies. I wrote the question before doing my due diligence and quickly tried to edit it before I looked like an idiot - too late though :)

The forum sends e-mails for watched topics.
You have to be very quick with editing to not get the initial post contents.
Read your post carefully before hitting reply. I’m not good at it and edit a lot. :-)

You are correct.

I’m learning though.

Kind of a CUDA, rather than pure OptiX question, about alignment in structures. If I design a structure such that the elements are in decreasing size, will that cause nvcc to avoid padding? Is it useful to put __align__(N) keyword in a CUDA struct - as in struct MyCudaStruct __align(16) { ... } ?

Thanks

That is normally not required. CUDA aligns structures to the biggest alignment required by any element inside the struct already.
See this documentation on the CUDA alignment requirements and esp. note the possibly different behavior for arrays of structs in host and device compilers in the second link:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernel-execution
When I use arrays of structures on host and device, I manually pad the structure size to the required alignment.

It’s needed if you would like to align things to bigger values. There are some examples in OptiX itself:
https://raytracing-docs.nvidia.com/optix7/api/html/group__optix__types.html#ga816ed5bbb93d53c783561497a474308e
where OPTIX_SBT_RECORD_ALIGNMENT is needed because the SBT record header starts with a char array which would normally be aligned to 1 byte and not 16.

Or this example where I defined my own half4 struct because CUDA provides only half and half2 types and it’s aligned to let the compiler generate vectorized (.v4) load and store PTX instructions:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_denoiser/shaders/half_common.h#L42

1 Like

Thank you @droettger for the very in depth answer.