I see that you got a reply on this thread. Why the performance of tf32 tensor_core is poor? - #7 by Shaquille I think that’s the best place to continue the discussion since there are more details there already.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Why the performance of tf32 tensor_core is poor? | 20 | 1788 | August 8, 2023 | |
Global Memory Access Optimization, tex throttling | 6 | 682 | May 8, 2024 | |
Jetson TK1 performance | 18 | 6426 | June 18, 2014 | |
High shared memory usage but low l1tex__data_bank_reads | 5 | 77 | October 24, 2024 | |
Cannot achieve max shared memory bandwith | 12 | 819 | November 20, 2023 | |
Shared Memory Bandwidth | 3 | 1409 | August 3, 2013 | |
Simple application not scaling well, trying to figure out reason(s) | 6 | 961 | July 31, 2015 | |
Why does the performance of using texture memory in the A4000 decrease compared to the RTX4000? | 1 | 47 | November 26, 2024 | |
What does the "shared_efficiency" really mean? | 5 | 2349 | November 16, 2023 | |
Maximum Tensor Core utilization | 4 | 155 | March 20, 2025 |