We at IBM Research have been working on developing a SpMV library for the nvidia GPU for sometime. Our approach differs from that of NVIDIA as we use the traditional CSR data format for storing and accessing the matrix data and use novel thread mapping and tiling strategies to optimize the computation. I have attached a technical report that describes the SpMV implementation and discusses the results.
The CUDA code is also available on: http://www.alphaworks.ibm.com/tech/spmv4gpu
Please let us know what you think!