RTX arrangement

Many people are looking forward to seeing RTX hardware become accessible directly through the CUDA (for sure, to use it in unusual way or maybe to write theirs own RTX-competitive/overwhelming analogues).
I suspect that “RT cores” are just tensor cores.

  1. First of all it is technically possible in principle to use warp matrix multiplication operation for ray-triangle tests and barycentric coordinates calculations. Maybe the same is valid for ray-box intersection tests.
  2. VK_NV_ray_tracing works more then 10 times faster on 2080Ti as opposite to work on 1080Ti (my own observation).
  3. There is evidence, that at last generation step speedups of some functions are up to 9 times benefiting from using of the new tensor cores. Cite: Thanks to Tensor Cores, using the cublasGemmEx API on the Tesla T4 with CUDA 10 achieves a speedup of up to 9x compared to the Tesla P4 as shown in Figure 9.
  4. wmma uses half data type for multiplicands’ elements
  5. I encountered a note, that the RTX 20 series RT cores to implement ray-AABB collision detection using reduced float precision.

It looks like conspiracy theory, but maybe it is reality. What do you think?

I can’t disassemble/debug OptiX code to get reliable evidence, because there are artifical limitations introduced into some key tools (namely: OptiX/RTCode debugging is limited to -lineinfo, and building this code with full debug infomation (-G) is not supported.).

Yes I do believe that Tensor cores are more flexible than thought (I’ve been multiplying 1024 bit integer numbers with these at speeds comparable to the XMP library…)

But shouldn’t there also be some kind of hardware acceleration for descending a hierarchical quad tree of AABBs? Even if the AABB test can be mapped to the tensor cores, I suspect there is more at play here.

I also noticed that getting the data into the correct shape/arrangement for the tensor core was adding significant overhead for my (multiplication) use case. So maybe there might be also a hardware instruction that reshapes the ray and AABB data for use in the tensor cores in an efficient way.

Is the reduced precision the reason for so much noise in RTX rendering? I have used CPU-based ray tracers for many years, like POV, and they do not have a noise problem.

Surely no. It depends on underlying technique and its implementation. Multidimensional integration uses (pseudo/quasi)random numbers usualy - probably this is the source of observed noise before image finally converges.

I doubt RTX has noise in rendering if allowed to proceed to an equivalent level of the CPU based raytracer experiment that is being used for comparison (e.g. to an equivalent number of rays cast).

RTX is designed to provide an excellent experience at framerates of 60 FPS or higher. Most CPU based raytracers don’t provide this (in my experience, anyway). A CPU based raytracer appears to have low “noise” because it is allowed to spend seconds or more rendering a frame (in my experience).

RTX in real world applications (e.g. games) is often used in conjunction with Deep Learning Anti Aliasing or Deep Learning Super Sampling (DLAA/DLSS). The RTX raytracing is allowed to proceed (in the space of a few milliseconds) until the frame is “partially” rendered (i.e. still visibly noisy). At this point the frame is turned over to a neural network inference process to complete the frame (DLAA/DLSS). This is a fairly complicated, carefully optimized process involving both the RTX “cores” as well as the tensor cores on a modern RTX GPU. The end result should not be “noisy”. But the intermediate result may be noisy.

We may have differing definitions of “noisy” and maybe even different experiences with CPU based raytracers (I have worked with a couple) so perhaps there is still disagreement here. Not trying to start an argument. It’s OK if you disagree with me. We may simply have different viewpoints, and it’s hard to tease all this out from a two-sentence question. If you still feel that RTX rendering is “noisy” in practice (i.e. as used in games, where from what I have seen the reviews are gushing about image quality) then I would probably attribute that not to the RTX part but the DLAA/DLSS part (which is obviously creating a picture from a set of mathematics which is not the typical rendering or ray tracing math.)

One could also imagine that in future GPUs, both the capability of the RTX raytracing and the DLAA/DLSS (or equivalent) process may be further improved so that they can deliver even better results in the fixed-time problem associated with real-time rendering/gaming/FPS

Does it mean that RTX “cores” are completely different/separate hardware things from Tensor Cores?

Yes, they are physically different and separate, and an RTX core is not a Tensor core. An RTX core is designed to solve the ray-volume intersection problem, and it involves, among other things, hardware traversal of a BVH. Tensor core is a matrix-matrix multiply engine.

I can’t go into details beyond that. If you want to claim that one is the same as the other, go ahead. I assure they are not.

It would be foolish for NVIDIA to hide a set of tensorcore functionality behind a RTX subsystem, and not make it available to the general purpose tensorcore usage. That would be artificially crippling the processor.

Nonono. I want to stay on constructive ground. Your authoritative opinion is of greatest value in this discussion with no doubt. You are the primary source of the information.

The Turing whitepaper may be of interest:

https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

My read of that does not support conclusions like:

but perhaps I am too trusting or gullible.

I should also point out that the RTX can provide raytracing without necessarily depending on DLSS/DLAA, and such applications might be interesting in the high quality rendering space (as distinguished from gaming). Frames of digitally created movies often spend minutes or more each being rendered, sometimes on a large CPU based cluster.

In such situations, RTX may be able to deliver the same level of quality in seconds. This doesn’t necessarily depend on DLAA/DLSS or tensorcore usage. Certainly if that claim is correct, noise can’t be a serious issue or differentiator.

https://www.youtube.com/watch?v=1IIiQZw_p_E

Robert : my question arose because I have read about “de-noising” being required in some RTX-generated images and I was wondering why. You have given me some clues so thank you.