It seems like every new release of CUBLAS and CUFFT does a good job of improving performance over the last revision. Mainly because code is further optimized for the single device.
We wanted to get a conversation going regarding the following idea:
As developers, would you be willing to sacrifice single device optimal performance, for less optimal performance that can scale over Multi-GPUs?
The MAGMA project aims to scale to MULTI-GPUs the functions commonly found in LAPACK, and I was curious to know what developers feel about this new direction.
**I have no affiliation with MAGMA, but we’re curious about your thoughts since we are working on a similar unrelated project.
Reference:
http://icl.cs.utk.edu/magma/