Hey all,

Just wanted to ask a simple question. In Readme.txt of current AMGX it states:

For the single GPU example:

./amgx_capi -mode dDDI -m ./matrix.mtx -c …/configs/FGMRES_AGGREGATION

For the MPI example:

mpirun -n 2 ./amgx_mpi_capi -mode dDDI -m ./matrix.mtx -c …/configs/FGMRES_AGGREGATION

If you see the results following the examples, the actual MPI solving version of the same solver/prec FMGRES_AGGREGATION for the same matrix A, matrix.mtx, does twice (and more) the time and uses double the memory vs the single GPU version.

Shouldn’t there be an easy to access partitioning example to actually see the speedup (or at least the 50% memory usage per GPU) in MPI vs single GPU examples? Can someone include an easy to use partitioning vector and/or example for the matrix.mtx, ie just split matrix in half row by row or something (I know there are tons of partitioning schemes).

PS: There is a simple example in the current AMGX reference for “AMGX_read_system_distributed” (par 1.1, page 25) with a 4x4 matrix and a 4x1 partitioning vector, but I don’t know how it would be for a big matrix (like matrix.mtx) that you cannot manually write the partitioning vector.