I am trying to find a CUDA equivalent of dgeev function call from LAPACK.
I compiled magma-1.1.0 on a Tesla C2070 and tested the dgeev function which benchmarks for matrices from size 1024 to 8064. It’s interesting to see the results for a 1024x1024 matrix, where GPU takes more time than the CPU.
N CPU Time(s) GPU Time(s) ||R||_F / ||A||_F
==========================================================
<b>1024 31.66 51.06</b>
2048 251.49 138.11
3072 515.84 322.13
4032 738.23 578.76
5184 1429.96 793.89
6016 1634.60 1136.89
7040 2171.73 1432.91
8064 3345.07 1625.88
I am trying to see if I can use dgeev for a 10x10 matrix 100,000 times (i.e. in burst mode).
In this scenario, each thread on the GPU solves for a 10x10 matrix. Therefore, assuming 64 threads are called, 64 10x10 matrices would be solved parallelising the whole operation.
Any suggestions on a CUDA library that can handle this??
PS: I have looked at CULA R12 and haven’t found anything on their forums that suggest a burst mode for small matrices.
Thanks in advance.