I found that the time expense on TX2 is quite large while calculating sparse matrix. The function cusparseScsrmv(…, CUPARSE_OPERATION_TRANSPOSE,…) included in cusparse.lib is called once but it cost 6 ms, while the time expense on a 1060 platform (together with a i5-8400 CPU, win10) was only 0.5 ms.
I wonder if the time consumption is normal on TX2 compared with that on PC.
And I would also like to know if there exists a faster way in the matrix calculation.
Thank you very much,
I have never compared these two platforms, but the performance difference does not seem to be too far from what one might expect from looking at some raw specs, such as 1/5 the CUDA cores in the TX2, 1/3 (or 1/4?) the memory bandwidth, probably lower GPU clocks in the TX2 as well? The difference in performance of about 8x seems to roughly match the difference in power consumption, so efficiency appears to be roughly the same.
Thank you njuffa.
Indeed, it’s unfair to compare these two different platform.