Hello! From the NVIDIA TURING GPU ARCHITECTURE white paper, I learned that RT cores consist of two specialized units, where the first unit performs the bounding box test and the second unit performs the ray-triangle intersection test. They save the SM from spending the thousands of instruction slots per ray, which is a computationally intensive process making it impossible to do on GPUs in real-time without hardware-based ray tracing acceleration. I wonder why these two units are able to quickly complete operations that are time-consuming on SM.
Any suggestions is appreciated. Please let me know if there is any relevant documentation explaining how RT cores work. Thanks in advance.