The problem is that elements of the matrices would be either 0 or 1 and using any library for spMM would be inefficient as they are built for matrices having elements that are larger than 1. Block Lanczos works on GF(2)[sup]n[/sup]. So I doubt that CUSP or CUSPARSE or CUBLAS would work efficiently for my problem. Any suggestions how I can resolve this.
My guess is that you’d have to build efficient routines yourself.
If you are working on integer factorization problems, you can start with the block Lanczos code in the msieve library.
Patrick Stach has done a fair amount of work on block Wiedemann solvers, and has a GPU-related talk here
Just to make myself more clear the code you have written is MPI code right not CUDA code.
That particular code is multithreaded C with optional MPI extensions, there’s no CUDA equivalent right now because efficient data structures in CUDA would take too much effort for my limited time budget. There’s also the side issue that the matrices that I need solved are large enough that even 4GB of memory in one GPU could not hold the matrix, so any solution would need to scale to multiple cards. Because it’s much harder to do that than throw non-CUDA code at a cluster of PCs, any CUDA solution would need to run maybe 10x faster on a card than PC code. Which is possible, but not in my spare time.
The conference slides referenced above give some code that uses CUDA. Those structures would be unwieldy for block Lanczos becase they don’t have to handle transpose matrix multiplies.
I was going through lots of code then finally I settled on COO matrix, and have only 1 gpu that too gt330m at my disposal. Can you tell me what size of matrix I can handle on this gpu. Also where can I find input matrices for the block lanczos algo. Are these matrices to be generated using msieve library? Or is there a source on the web from where I can download them.
That card can apparently have 1-2GB of memory. A matrix with 700k columns (70 nonzeros per column) and vectors of block width 64 takes up about 700MB of memory, and this is enough to handle the matrix from factoring a ~135-140 digit general number. For a 512-bit RSA key you would need close to 2GB of memory for the matrix; assume a matrix of size 4.5M with 70 nonzeros per column.
The block Lanczos code in msieve is only useful for finding vectors in the nullspace of these matrices, so you’d have to modify it if you need it to do anything else. There is no archive of factorization matrices that I know of, I have about 10 such matrices ranging in size from 100k to 2.3M columns. The 100k size takes 50MB and solves in 7 minutes on my older Core2. Usually someone will register at mersenneforum.org and ask the Factoring subforum to upload some matrix datasets somewhere, then they get buried with huge matrices :)
Msieve does do all the work of generating the data, converting to matrix format, packing and solving.
I was going through msieve and ggnfs execution, but couldnt makeout when did the block lanczos started. Is it the matrix solving step as indicated in factmsieve.py. Also the matrix generated for block lanczos is stored in which file and in what format? Please help me with this.
More details (along with a link to the binary on-disk format) here.