Allowing multiple threads to process a single pixel.

Worlder · June 3, 2014, 9:50pm

Ok the thing about my OptiX program is that it is a non-real time path tracer and a significant portion of my pixels hit the background when rendering my scenes.

So I was wondering about the possibility of putting the threads that would normally repeatedly calculate the background color to work on calculating new samples for the pixels that render the geometry.

Of course, that would require some thread synchronization to keep track of how many samples have been calculated and synchronization across devices is currently not possible. But still perhaps some of you can tell me if it is possible to force certain threads to process pixel that do not correspond to that thread’s launch index and do so in a way that doesn’t affect the resulting image.

droettger · June 4, 2014, 8:31am

First, you don’t have that fine grained access to CUDA resources within OptiX. It’s using a single ray programming model and everything about blocks, warps, threads is internal and must not be touched. But there are other ways.

One of my path tracers is doing adaptive tiled rendering with a convergence threshold calculation which will only shoot more rays in regions where there is more work to do. It quickly finds primary rays hitting the miss shader and reflections of it in specular regions and stops wasting rays on that.

If you’re able to determine which regions on your rendering need more work, you could simply track a list of regions or individual pixels for your next launch. The amount of pixels in that list is your new launch dimension and the information inside that list is used to calculate the pixel coordinate in the full sized output image which receives the result.
It’s like scattered writes without the need for synchronization with atomics because every launch index writes to a separate result pixel. (Using atomics in OptiX wouldn’t work on multi-GPU!)

I’m not actually sure if OptiX distributes 1D launches, so I’m using 2D launches which fit around the number of rays and fill up the last row with dummy rays which will be skipped inside the ray generation program if required. Worked nicely.

m_sch · June 4, 2014, 8:57am

Sorry to hijack, but I just read and wondered:

What dou mean by “distributing 1D launches”?

droettger · June 4, 2014, 10:34am

Automatically distributing rtContextLaunch1D() to multiple GPUs.

OptiX distributes the work with some tiling mechanism and I have never gotten around to test if that works as expected with rtContextLaunch1D(), because I’m normally doing image synthesis and that is at least rtContextLaunch2D(). Someone from the OptiX core team might chime in here.

over0219 · June 6, 2014, 10:03pm

I just wanted to note that (unless I’m mistaken) there is a way around this. Atomics do work if buffers are set to RT_BUFFER_GPU_LOCAL, where each GPU has it’s own copy. After the rtLaunch you can post-process the multiple buffers (from each GPU) with CUDA to get the result you want.

Topic		Replies	Views
Fill output buffer from multiple threads OptiX	8	1384	October 12, 2021
Task scheduling in OptiX 7 OptiX	6	1209	October 12, 2021
Multi GPU OptiX	7	3119	June 14, 2022
Optix - Rays vs Pixels, Multiple rays/pixel OptiX	5	2038	June 14, 2022
Concurrent access and growable buffer OptiX	8	857	June 14, 2022
Optix-low computational usage on GPU OptiX	12	917	June 22, 2022
Multi-process access to a single Optix Context OptiX	5	766	June 14, 2022
Launch dimensions in LaunchContextnD and optixLaunch OptiX	5	1568	October 12, 2021
How many rays can be processed in parallel OptiX	1	600	August 14, 2023
Multi-GPU with several float buffers OptiX	5	1305	June 14, 2022

Allowing multiple threads to process a single pixel.

Related topics