rtBuffer - indexing

picard1969 · February 2, 2021, 4:05pm

Good morning,

I have some OptiX 5/6 code that employs indexing into a rtBuffer declared at top of a given shader. I know that OptiX 7 no longer employs the rtBuffer macro, so can anyone point me to where the definitions exist that index the rtBuffer in OptiX 5/6? For example, given a PTX shader below:

rtBuffer<float4, 2> rayBuffer;
rtDeclareVariable(uint2, launchIndex, rtLaunchIndex, );
...
RT_PROGRAM void genRays() {
  ...
  // Do some stuff
  ...
  // How does rtBuffer compute index below?
  rayBuffer[launchIndex] = make_float3(0.0f, 0.0f, 0.0f);
  ...
}

I’m mainly just looking for how OptiX 5/6 computes an index so that I can mimic something similar in OptiX 7 - if that is possible.

I apologize ahead of time for the simplicity of the question, but am still trying to learn.

Thank you kindly for any help.

droettger · February 2, 2021, 4:51pm

In OptiX 7 “buffers” are just 64-bit CUdeviceptr to linear memory.
You’re responsible for the allocation, alignment, and addressing.

Then all semantic variables (e.g.like above rtLaunchIndex) are replaced by OptiX 7 device functions.

For your given example of a tightly packed float4 buffer indexed by a 2D launch index that would look like this:

With this pointer declared inside the launch parameter block:

float4* outputBuffer;

This would be the code to write to it when the buffer dimension matches the launch dimensions:

const uint3 theLaunchDim   = optixGetLaunchDimensions();
const uint3 theLaunchIndex = optixGetLaunchIndex();
const unsigned int linearIndex = theLaunchDim.x * theLaunchIndex.y + theLaunchIndex.x; // width * y + x
sysData.outputBuffer[linearIndex] = make_float4(0.0f);

You could also define it as CUdeviceptr which is just a 64-bit unsigned value and then reinterpret it to the pointer you need, in case your renderer supports different output formats.
Shown here: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/shaders/raygeneration.cu#L230

(Note that your code wouldn’t compile because you assigned a float3 to a float4.)

Reading through the ray generation program of the simplest OptiX SDK example optixHello, which just fills the output buffer with a constant color, would have answered this. More due diligence please.

picard1969 · February 2, 2021, 5:07pm

Thanks @droettger for the reply.

I had the computation of uint3 launch index and launch dimension to linear index from optixHello example, but was just wondering what OptiX 5/6 did when calculating different dimensions. I suppose the answer is actually not much different though.

The CUdeviceptr information is certainly an interesting approach though.

Thank you again.

droettger · February 3, 2021, 7:40am

That is not really a concern in OptiX versions before 7 for the device code, because you simply have matching buffer and index dimensions. Means a 3D buffer is indexed with a 3D index.
The assumption is that the data is tightly packed, means no bigger stride between elements, no row padding.
That becomes apparent when mapping the buffer.

picard1969 · February 3, 2021, 12:31pm

Good to know.

Quick question about OptiX standards. Is it common to separate out the different shader programs into different files? For example, the raygen program is a file called raygen.cu and the closesthit program in another file called closesthit.cu?

Thanks again.

droettger · February 3, 2021, 12:48pm

You can handle that as you like.
For big projects it might get awkward having everything in one file.
Also compile times might become an issue.

picard1969 · February 3, 2021, 12:55pm

Thanks again @droettger. I agree, big project(s) code could become unwieldly if it was all in a single CUDA file. Compile time might become an issue later though, for now I am happy to get some working code.

droettger · February 3, 2021, 1:30pm

All my examples are using separate modules.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/src/Device.cpp#L548
You probably noticed that because you’ve edited the question while I was answering. :-)

picard1969 · February 3, 2021, 1:52pm

Yes, apologies. I wrote the question before doing my due diligence and quickly tried to edit it before I looked like an idiot - too late though :)

droettger · February 3, 2021, 2:10pm

The forum sends e-mails for watched topics.
You have to be very quick with editing to not get the initial post contents.
Read your post carefully before hitting reply. I’m not good at it and edit a lot. :-)

picard1969 · February 3, 2021, 2:14pm

You are correct.

I’m learning though.

picard1969 · February 4, 2021, 5:52pm

Kind of a CUDA, rather than pure OptiX question, about alignment in structures. If I design a structure such that the elements are in decreasing size, will that cause nvcc to avoid padding? Is it useful to put __align__(N) keyword in a CUDA struct - as in struct MyCudaStruct __align(16) { ... } ?

Thanks

droettger · February 5, 2021, 7:57am

That is normally not required. CUDA aligns structures to the biggest alignment required by any element inside the struct already.
See this documentation on the CUDA alignment requirements and esp. note the possibly different behavior for arrays of structs in host and device compilers in the second link:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernel-execution
When I use arrays of structures on host and device, I manually pad the structure size to the required alignment.

It’s needed if you would like to align things to bigger values. There are some examples in OptiX itself:
https://raytracing-docs.nvidia.com/optix7/api/html/group__optix__types.html#ga816ed5bbb93d53c783561497a474308e
where OPTIX_SBT_RECORD_ALIGNMENT is needed because the SBT record header starts with a char array which would normally be aligned to 1 byte and not 16.

Or this example where I defined my own half4 struct because CUDA provides only half and half2 types and it’s aligned to let the compiler generate vectorized (.v4) load and store PTX instructions:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_denoiser/shaders/half_common.h#L42

picard1969 · February 5, 2021, 12:38pm

Thank you @droettger for the very in depth answer.

Topic		Replies	Views
resizeBuffer? OptiX	5	719	June 14, 2022
Going through optix7course and am confused about LaunchParams, and how to get depth buffer OptiX	4	1196	February 9, 2022
OptiX Time for Launch OptiX	9	1334	June 14, 2022
RT_BUFFER_INPUT_OUTPUT \| RT_BUFFER_GPU_LOCAL question OptiX	2	968	October 12, 2021
Passing buffers from an RT_PROGRAM to a __device__ function OptiX	14	1137	June 14, 2022
Optix 4 and CUDA interop, new limitation with input/output buffers OptiX	15	3843	June 14, 2022
[bugreport & fix] Optix 7 Corrupts CUdeviceptr in the SBT due to truncation [Hardcore] OptiX	5	1007	June 14, 2022
optiXTutorial 11 - remove (free)GLUT OptiX	37	4650	June 14, 2022
[SOLVED] OptiX 5 interop DirectX 11 example? OptiX	9	2377	June 14, 2022
How to write from closesthit() to a device buffer OptiX	12	1093	June 14, 2022

rtBuffer - indexing

Related topics