Hi,
I just startet with CUDA and played around with CUBLAS. For my work, I need a generic (good performing External Image ) DGEMM implementation which I can modify, so I need the source code which is not possible for CUBLAS…
Do you know any “open” CUDA-DGEMM implementation I could look into? If not, do you have any tips for me, how to implement an efficient DGEMM?
:">
Best regards,
gemini