I was wondering if Nvidia has any plans for making the source code of the CuBlas library available. This would seem to be in the interest of everyone, particularly given that not all Blas functions have yet been implemented; also optimization and interoperability with other Blas libraries will be important once they have. I realize that Nvidia isn’t particularly keen on open source, but in this case it’s apparent that none of the usual reasons for this policy (eg: protecting intellectual property) apply. [I’m assuming here that the library is implemented exclusively in Cuda C-, with no assembler tricks]
I suppose that by not replying, NVIDIA is undecided.
I would like to add my voice to that of noegenesis! If CuBlas is implemented in cuda-c then it would only be advantageous to NVIDIA that a few people start mucking about with it, trying to increase its performance or add features!
As a research student I believe that this move would be much appreciated all over academia.
Looks like I am reopening a very ancient thread!!.. I was googling around for the CUBLAS source code and this was the only result which came up. And in the link given in this thread, it says that the CUBLAS source code has been removed.. Does anyone know what happened to them? Is it possible to get the source code at all.. It would be very very helpful..
Even if you can locate the sources, consider that CUDA hardware and software have changed a lot over the years. If you are looking for source code since you need a feature not currently supported by CUBLAS, consider filing a feature request through the bug reporting form (simply prefix the synopsis with “RFE:” to mark it as a feature request rather than a bug).
That code generally outperforms cublas across the board, and is sometimes 2-3x faster in some dimensions important for deep learning. The python lib is also a pretty slick gpu accelerated numpy implementation. It’s still a work in progress and I haven’t had much chance to fully document it yet. But it should have lots of application outside of deep learning.
Hello Josh, it’s a few years later. Out of high interest - did you somehow obtain manage to get, or have a guy visit you who dropped by accident an USB stick with some source codes of the CUBLAS?
As reverse engineering never has been my thing of course.
A GPU burn test i did do written by a cool guy who used matrix calculation function is based upon a profiler report printing how many double precision Tflops i’m getting here (1.0 and 0.9 for the Titan Z here after a second or 10). Yet that’s a number from ‘we from coca cola advise coca cola’.
Wanna count instructions by hand there and calculate my own numbers there. If you do a matrix multiplication O ( n ^ 3 ) i would guess that it’s simple C code and not lower level - as there is a world of optimizations possible prior to moving to lower echolon levels :)