Many people are looking forward to seeing RTX hardware become accessible directly through the CUDA (for sure, to use it in unusual way or maybe to write theirs own RTX-competitive/overwhelming analogues).
I suspect that “RT cores” are just tensor cores.
- First of all it is technically possible in principle to use warp matrix multiplication operation for ray-triangle tests and barycentric coordinates calculations. Maybe the same is valid for ray-box intersection tests.
- VK_NV_ray_tracing works more then 10 times faster on 2080Ti as opposite to work on 1080Ti (my own observation).
- There is evidence, that at last generation step speedups of some functions are up to 9 times benefiting from using of the new tensor cores. Cite: Thanks to Tensor Cores, using the cublasGemmEx API on the Tesla T4 with CUDA 10 achieves a speedup of up to 9x compared to the Tesla P4 as shown in Figure 9.
- wmma uses half data type for multiplicands’ elements
- I encountered a note, that the RTX 20 series RT cores to implement ray-AABB collision detection using reduced float precision.
It looks like conspiracy theory, but maybe it is reality. What do you think?
I can’t disassemble/debug OptiX code to get reliable evidence, because there are artifical limitations introduced into some key tools (namely: OptiX/RTCode debugging is limited to -lineinfo, and building this code with full debug infomation (-G) is not supported.).