Maximizing GPU Utilization

Let me get this straight. I’m developing my own ray tracing app (maybe real-time path tracing as well, but it’s so far beyond my knowledge lol). So, the most important thing I need to use is CUDA cores. But I need to use RT cores and Tensor cores as well. My question is, how can I really utilize the RT cores and Tensor cores? What tool do I need? Is it just CUDA Toolkit that is necessary for computer graphics and machine learning? Does OptiX using CUDA APIs? Does RT cores automatically utilized when I do DXR with my GPU? Does D3D includes from CUDA enough to do DirectX programming?

Explained in this video Tensor Cores in a Nutshell - YouTube, that Michael H said: “the best way to think about it is, the Tensor cores are basically inside CUDA cores”. Does that thing also apply to RT Cores? Another question is, does the RT cores, Tensor cores, registers, and shared memory grouped into one Streaming Multiprocessor?

For instance, my laptop has RTX 3050. Running deviceQuery, I found that it has 16 SMs, with 128 CUDA cores each SM, so that’s true I have 16 * 128 = 2048 CUDA cores in total. So, I have 64 Tensor cores, with 64 / 16 = 4 Tensor cores each SM? And then 16 RT cores, with 16 / 16 = 1 RT cores each SM?

Actually, I have a lot of question to ask. But I think it’s enough for now. Please help me.

You will automatically use RT cores for ray tracing when using one of the three available ray tracing APIs:
OptiX, DXR (DirectX12 ray tracing), or the Vulkan ray tracing extensions.
The display drivers will take care to make use of the RT cores the best they can.

All three ray tracing APIs allow programming their different ray tracing program domains (e.g. ray generation, closest hit, any hit, intersection, miss, …) in device code using different languages. CUDA C++ for OptiX, HLSL for DXR, GLSL (or HLSL) translated to SPIR-V for Vulkan RT.

Does OptiX using CUDA APIs?

Yes, OptiX is built on CUDA and since OptiX 7 versions you use the CUDA host (runtime/driver) APIs to manage all device resources directly and explicitly.

Here’s an (older) comparison of OptiX and Vulkan RT:
https://forums.developer.nvidia.com/t/what-are-the-advantages-and-differences-between-optix-7-and-vulkan-raytracing-api/220360

Tensor cores are usually programmed directly with CUDA.
They are used inside OptiX for the built-in AI denoiser.
(I don’t think the Tensor core instructions themselves work inside developer provided OptiX device code though.)
Other than that, Tensor cores are used by the deep learning and inferencing tools you’ll find on the NVIDIA developer site.

What tool do I need? Is it just CUDA Toolkit that is necessary for computer graphics and machine learning?

For ray tracing with OptiX you need a supported NVIDIA GPU (Maxwell architecture and above, RTX boards highly recommended), an OptiX SDK 7.x version, a display driver supporting that OptiX 7.x version, a CUDA Toolkit supported by that display driver, a host compiler supported by that CUDA Toolkit version.
These things are listed inside the OptiX Release Notes. Find the link directly below the OptiX SDK download link of each version.
(E.g. I’m currently using RTX boards under Windows 10 with 535.98 drivers, OptiX 7.7.0, CUDA Toolkit 12.1, MSVS 2022.)

For ray tracing with DXR you would need the DirectX12 SDK. For Vulkan RT you would need the Vulkan SDK.
For these you would need an RTX board to have all features supported.

Does RT cores automatically utilized when I do DXR with my GPU?

Yes.

Does D3D includes from CUDA enough to do DirectX programming?

Not sure what you’re asking exactly.
DirectX 12 contains D3D and DXR graphics APIs. CUDA is a general purpose GPU programming API.
CUDA supports interoperability with OpenGL, D3D and Vulkan graphics APIs though, so that you can share some resources like buffers and texture among the APIs.
OptiX 7 doesn’t need CUDA interop because you’ll implement all resource management with CUDA host runtime or driver API calls inside the application yourself.

For more information about your GPU specific questions, please have a look through the CUDA programming manual at https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
That explains the CUDA programming model and differences in GPU configurations and streaming multiprocessor capabilities.
That won’t explain RT cores though because these cannot be programmed directly. They can only be used via the three ray tracing APIs mentioned above. (OptiX, DXR, Vulkan RT).

If you decide to use OptiX for raytracing, more information can be found inside the OptiX 7 Programming Guide and API reference: https://raytracing-docs.nvidia.com/

If you have additional questions not answered in these docs or the OptiX developer forum already, just ask them there.