CuSPARSE: Which is the "best" matrix storage format?

When I can freely choose a matrix storage format when working with CuSPARSE, which one should I choose for best performance?

I will get my matrix on the CPU from another library in a format which is not supported by CuSPARSE, will then transform it to a CuSPARSE format and finally transfer it to the GPU to compute several matrix-vector-products for usage with a large distributed equation solver. I might also want to compute a preconditioner on this matrix.

The matrix originates from an FEM and will therefore have an (unregularly) banded structure. The sparsity pattern will not change between multiple iterations of the above mentioned process, so I can use a precomputed indx map for the transformation between CPU and GPU format.

here is a link to the many sparse formats which can be used in cuSPARSE:

http://docs.nvidia.com/cuda/cusparse/index.html#cusparse-format-conversion-reference

The most ‘common’ is probably CSR, which is used by most other sparse libraries.

There are claims that the use of the hybrid format results in the best performance, but I have no experience with those types. CSR is what I use.