optixTriangle: how to shoot rays to specific set of co-ordinates?

droettger · June 20, 2024, 1:16pm

My first question is rather simple, what does launch actually optix create when you render a 1024 x 768 size?
Reading about cuda indexing I see that there is:

a grid with x,y,z dimensions containing blocks

a block with x,y,z containing threads

and threads

You do not need to be concerned about that because OptiX provides a single-ray programming model and all scheduling of that to available GPU hardware capabilities is handled internally.

You only need to care about the optixLaunch arguments and the optixGetLaunchDimensions and optixGetLaunchIndex to do work per launch index.

Note that the OptiX launch dimension is limited to 2^30 which is smaller than in native CUDA. See the Limits chapter inside the OptiX Programming Guide.

These launch indices are effectively CUDA threads which are running in warps of 32 threads. How many blocks are used internally depends on the amount of resources being used.

You could be concerned about occupancy in warps when programming your kernels (the more divergent code is executed in threads in a warp, the lower the occupancy, the worse the efficiency of your kernel). So program device code in a way that most code does the same thing when possible.

The other thing which affects the scheduling is the number of registers a kernel is allowed to use and the default in OptiX is 128 because there is usually a performance cliff when going higher, but in a few cases, depending on the complexity of the device code and the underlying GPUs, higher values can make sense when there is too much register spilling when allowing too few registers, so there is a setting in OptiX to experiment with that number of registers. (See link below.)
Mind that this is per GPU and you should not change the default blindly for all GPUs when you’re not able to verify the effect. I recommend to not touch the default before you’ve optimized everything else.

You will be able to see the occupancy and number of blocks OptiX kernels launched inside an Nsight Compute profile summary.

Read this post for more details: https://forums.developer.nvidia.com/t/high-stall-mio-throttle/274590/4

I am not sure but when you render a1024 x 768 image, do you simply create a single1024 x 768 block with 786,432 threads?

Nope, that’s not how the grouping of threads into blocks works. If you read the CUDA Programming Model chapter inside the CUDA Programming Guide again, you’ll find this sentence: “On current GPUs, a thread block may contain up to 1024 threads.”

My other question has to do with handling the hit event. Let’s say a ray hits the triangle and I want to store all the x coordinates of the rays that had a hit. How can I transfer this collection of x back to the main function?

I’ll answer that inside the other thread with the same question.

Topic		Replies	Views
Some questions about ray OptiX	10	1778	May 12, 2023
optiXTutorial 11 - remove (free)GLUT OptiX	37	4649	June 14, 2022
How to configure launch grid configuration for computational task? OptiX	2	219	June 20, 2024
I would like to know why "miss" is occurring OptiX	15	1171	September 26, 2023
OptiX Time for Launch OptiX	9	1333	June 14, 2022
How does optix code compilation work? OptiX	24	3313	July 7, 2022
Comparing Optix performance to CUDA OptiX	20	6093	June 14, 2022
"illegal memory access" when trying to set up multiple cameras OptiX	7	199	July 13, 2024
Wraps in Ray gen and how data is initially stored in the memory hierarchy OptiX	13	1026	June 14, 2022
Struct of vectors instead of vector of structs in Optix API OptiX	6	1493	June 14, 2022

optixTriangle: how to shoot rays to specific set of co-ordinates?

Related topics