I use CUSPARSE, CUBLAS, CUSOLVER libraries in my program. Prior kernels yield all matrix data and control information (matrix row/column sizes, nnz values, leading dim values) on device.
To use the above libraries, I need to copy the very small control information from the device to the host, losing precious time in the process.
To give a picture - computation takes < 10 ms, copy of 50 bytes takes 150ms (same for pinned mem).
Is there any way I can point these libraries to information already stored on device?
Any thing else I can try?