How to implement a rasterizer with optix

Hi,

I have been trying to implement an efficient rasterizer for the past few months. I tried optix but the results were not very satisfactory.

The input data is a large number of polys(or earcut triangles), what I want is only rasterization with one color, transfering vector data to pixel map, not considering any complex issues(depth, the type of light, rendering…these issues do not require efficiency to be considered temporarily).

I want to know if it’s possible to make optix lighter in order to reduce the time of rasterization. I used optix-8.0.0/SDK/optixTriangle code with GAS acceleration.

My environment: CUDA 12.2 / Ubuntu 22.04 / Optix 8.0.0 / Cuda Driver 535.54.03
My data: 100,0000 triangles uniform distribution
20*2^30 pixels

Any help ablout optix or ideas about gpu rasterization could be appreciated.

Thanks

I have been trying to implement an efficient rasterizer for the past few months. I tried optix but the results were not very satisfactory.

So this is basically a follow-up from your previous post which already addressed the debug vs. release performance.

Could you please quantify your results again.
What performance do you require and what is the performance you achieve today?

I assume you’re still using the RTX 3060?
Do you have an opportunity to test your application on a faster GPU instead? (Higher-end board and/or newer GPU generation.)

I want to know if it’s possible to make optix lighter in order to reduce the time of rasterization. I used optix-8.0.0/SDK/optixTriangle code with GAS acceleration.

Without knowing how your source code and geometry data looks, there isn’t much to help with at this point.

100,000 triangles is not much for either a rasterizer or raytracer.
Raytracers excel with high triangle counts because of the spatial acceleration structure and they allow ordered rendering due to the closesthit finding. They scale with ray count (the more rays, the slower).
Rasterizers scale with triangle count on the vertex engine (the more triangles, the slower), and on the fragment count on the raster engine (the more fragments, the slower).

If all you need is to rasterize 100,000 triangles into a very huge image without depth information or expensive shading, have you implemented that with a rasterizer API (e.g. using OpenGL, Vulkan, DX12) before?
That would also require multiple tiles because there is an upper limit on the 2D image resolution inside the GPUs as well, e.g. 16384 x 16384 which would need 80 tiles to render 20 * 2^30 pixels.

I cannot say which method would be faster without implementing both.
Raytracing primary rays only with no shading is basically the maximum performance you can get from a raytracer. At low triangle counts that will reach the maximum rays/second a hardware can handle.
What is the rays/second you currently achieve?

Use the OPTIX_BUILD_FLAG_PREFER_FAST_TRACE flag for the acceleration structure build.

If the result is simply a boolean indicating if a ray hit any triangle, there isn’t even a need to encode that into more than a single byte. Even single bits would be possible but would require atomics to write. So that would reduce the amount of memory you would need to copy from the device to the host when needed.

Simple data takes approximately 5 seconds, it has 300,0000 triangles. 4Grays/s
Complex data takes approximately 10 seconds(like a large and dense grid, the triangles after cut are very narrow and dense), it has 100,0000 triangles. 2.3Grays/s

What I want is ablout 3 secondes per same size map, maybe longer for complex data.
Yes, I still using the RTX 3060, purchasing a newer GPU requires time and process, and I need to confirm that the newer GPU can solve this problem.

My colleague had tested it on OpenGL, I will use these libraries to retest.

Yes, I use it.

The time for data transmission has been removed, I will try to write using uchar1 or bit.

Thank you for your explanation of raytracers and rasterizers. I will delve deeper into this.

I also tried to use cuda, but I found that it’s very difficult without an efficient tree. And an efficient tree is already in Optix, so I restarted researching on how to use Optix correctly and efficiently.
In the future, we may need to simulate the diffraction of light, so Optix is really a good choice.
@droettger

One more thing to try for additional performance:
If you’re currently using an OptiX render graph layout with a single GAS (OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS), please try to replace that with a render graph layout using a IAS->GAS structure (OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING) instead because traversal through that is fully hardware accelerated on RTX boards.

You would just need to add a single OptixInstance with a identity matrix over the GAS and use the traversable handle of that IAS instead. There’s no need to get the transform list inside the device code when you know that there is only one identity matrix, which means object space is the same as world space for the vertex attributes in your scene.

Thank you for this note, I have changed this FLAG for IAS-GAS.
I will run my program on RTX 4080Super graphic card and I believe that there is still a relatively large optimization space with more efforts and attempts.

To be clear, only changing the flag won’t work. You need to build and use a top-level instance acceleration structure as well and use the previous GAS as traversable handle in that OptixInstance and then use the instance’s traversable handle inside your launch parameters.
Something like this: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1705

Don’t worry, I’m sure that IAS-GAS works in my program.
Thank you again for your patience to a beginner.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.