MatrixMultiplication GPU 1080Ti/CompilationOptimization?

pierrot91 · April 27, 2017, 9:15am

I have recently installed 2 GTX 1080Ti on a X99 motherboard equipped with i7-20cores… I want to dispatch matrix multiplication job on cpu and GPU and waited for a very fast calculation on the GPU…
A little program in F90 showed that computing time is nearly equivalent on CPU (using OPENBLAS) and on GPU (OPENACC or CUBLASdgemm)… nearly 5 seconds for 10000x10000…
Trying the same program on a mac equipped with a quadro 4000 is only 3 times slower…(surprising as 1080Ti seems to be much more performant).
I feel that I do no use 1080Ti in an optimized way!!!
Can some one help me ?
Thanks a lot
Pierre

cbuchner1 · April 27, 2017, 12:45pm

Do your benchmarking timings include the transfer times of the matrices via PCIe bus?

Christian

tera · April 27, 2017, 1:10pm

Try sgemm instead of dgemm. Your 1080Ti’s double throughput is only 1/32th of the single throughput, putting it’s double throughput in the same ballpark as your 20-core i7’s.

pierrot91 · April 27, 2017, 1:52pm

Thanks Christian, the problem doesn’t come from the data transfert ( about 5% of the total computation time)

pierrot91 · April 27, 2017, 1:57pm

Thanks to you…
Using sgemm is ok and gpu is faster than cpu, but
unfortunately we need double precision.
Is it a problem of compilation with the correct pascal architecture code
in PGI fortran.
Thanks

bha4395 · April 27, 2017, 2:05pm

If you need double precision, the 1080TI isn’t going to be where you find it. Tesla GPUs are going to be your best bet.

For example, the P100 or KX0 series Tesla GPUs.

The difference is substantial, the 1080ti has roughly 350GFlops of DP performance while the K20 the worst of the Tesla GPUs for DP that I suggested has close to 1200 GFlops of DP performance.

The P100 has over 4 TFlops of DP performance.

(I neglected the K10 as it does not have good DP performance)

cbuchner1 · April 27, 2017, 3:14pm

The older Kepler based GTX Titan and Titan Black 6GB models had unlocked DP throughput. You might find them quite cheap as used models.

However the modern Maxwell and Pascal based Tesla offerings would offer higher DP throughput at lower power consumption (but at high cost)

bha4395 · April 27, 2017, 4:14pm

That’s the series I was looking for! I knew older gen titans supported DP but I only looked at Maxwell and forget to look at Kepler.

pierrot91 · April 28, 2017, 8:16am

Thanks to all (bha4395, cbuchner1, tera,christian) for your prompt and kind responses !!!
Apparently, if I understand you correctly, I got a little rushed in the purchase of the 2 1080 Ti.
To make scientific computation in double precision (at low prices), it seems that titans X black
would have been preferable if i understand well…
Do you know if the drivers for such cards are easy to set to get a FP 1/3 that is nearly equivalent to K40c?
(ArrayFire site: explaining FP64 performance on GPUs).
Do you know if the drivers set in ArrayFire or Magma can be seated to optimize the computation capability of
the cards (1080 Ti or TitanX)…
Thanks a lot to all for your help to beginners in GPUs programming (that seems to be a fabulous world)
Very Friendly greetings from Paris
Pierre

bha4395 · April 28, 2017, 2:00pm

I’m slightly confused by your question.

Are you looking for a GPU that has the FP performance of 1/3 of the K40?

[url]https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units[/url]

This wiki page has listed FP and DP performance for all NVIDIA GPUs.
Maybe this will be helpful for whatever you are looking for.

Robert_Crovella · April 28, 2017, 2:41pm

The 1/3 is referring to the idea that Kepler family products come in 2 categories:

Those whose DP throughput is 1/24 of the SP throughput. An example is K10
Those whose DP throughput is 1/3 of the SP throughput. Examples are K20, K40, K80. Certain Titan family members are in this category as well.

bha4395 · April 28, 2017, 5:19pm

Thanks txbob. That makes a lot of sense. Would have never known it otherwise.

BulatZiganshin · April 29, 2017, 11:36am

of the top of my head, only first titan had good DP performance

Robert_Crovella · April 29, 2017, 3:50pm

original Titan, Titan Black, and Titan Z all had the possibility for elevated DP perf.