I wanted to use the cufft and npp and cublas from CUDA.
I have experienced that these files are extremely large.
Why is this so? Currently I’m using MKL and with everything
that is in the MKL the file is only 20MB large (incl. FFT, CUBLAS, …)
All the libraries contain so called fat binaries, meaning there are multiple instances of each kernel: a machine code instance for each supported architecture, plus a PTX instance for JITting to future architectures for which no machine code exists yet.
OK this is what I understand, but why isn’t there PTX-Code only?
It seems to me that with every new hardware, with every new compute capability the size of the DLLs will increase more and more.
Isn’t this a little far from being practical?
As far as I understood using PTX-Code only it would work for every graphics card compiled with a special compute capability.
It would be much nicer to have ptx-code and dlls for each compute capability