Note that making design decisions on FLOPS alone is rarely a good idea. The application may turn out to be (partially) memory bound, especially on a platform that uses fairly low-performance memory like NVIDIA’s integrated platforms. Also, you are unlikely to get really close to the theoretical FLOPS rate in an actual application. For compiled code, 75% may be as good as it gets.
In general, FLOPS ratings are OK as a disqualifying criterion, but likely insufficient as a qualifying criterion.
hello,I am now evaluating the tx1 fp16 calculated performance.but I can’t get the theoretical 1TFLOPS .
i only get 0.86TFLOPS.
My experimental steps are as follows.
1, i load the jstson_clocks.sh before run the test code.
2,the test code is bellow :using Matrix Multiplication to Test Computing Performance https://github.com/hma02/cublasHgemm-P100
3, I used fp16’addition and multiplication to write some simple test programs.No theoretical results were obtained.
What is the cause of this result?Is there any way to get the theoretical TFLOPS? thanks.