Partition Question about AmgX

Hi dear all:

The question is a bit complex,

Situation: running AmgX to solve Ax=b, where A is about 170000x170000 with initial guess [0,0…0]

mpirun -n 8  ./cuilongyin -mode dDDI -m ./matrix.mtx -c ../configs/config
       cuilongyin is the excutable based on the example from the package; matrix.mxt is a matrixmarket file including the mat(block size 1), rhs and sol; config file is as below:
"config_version": 2,
    "solver": {
        "preconditioner": {
            "error_scaling": 0,
            "print_grid_stats": 1,
            "algorithm": "AGGREGATION",
            "solver": "AMG",
            "smoother": "BLOCK_JACOBI",
            "presweeps": 0,
            "selector": "SIZE_2",
            "coarse_solver": "NOSOLVER",
            "max_iters": 1,
            "min_coarse_rows": 32,
            "relaxation_factor": 0.75,
            "scope": "amg",
            "max_levels": 100,
            "postsweeps": 3,
            "cycle": "V"
        },
        "use_scalar_norm": 1,
        "solver": "FGMRES",
        "print_solve_stats": 1,
        "obtain_timings": 1,
        "max_iters": 1000,
        "monitor_residual": 1,
        "gmres_n_restart": 32,
        "convergence": "RELATIVE_INI",
        "scope": "main",
        "tolerance": 0.000000000001,
        "norm": "L2"
    }

I am able to display the solution after solving using this:

AMGX_vector_get_size(x, &n, &block_dimx);

and

void* result_host = malloc(n*block_dimx*sizeof_v_val);

and

AMGX_vector_download(x, result_host);

However, the size of the solution vector is 20000 bigger 170000 when I specify -np 8. The size of the solution vector is exactly 170000 when I use -np 1. The extra are filled with 0s.

Then I looked at the partition_sizes, partition_vector_size and partition_vector realizing that NULL is passed meaning that a trivil partitioning is performed. But it wouldn’ explain why are there so many extra 0s.

My question is, do I need to manually set the parameters according to number of cores I use to avoid extra 0s in my solution vectors or did I missed some key points misunderstanding something?

How does AmgX perform a trivial partition? is there a function like MatView in PETSc in AmgX?

I thought if I run like

mpirun -n 4

to solve a 12 X 12 matrix, then each process would have a 3-row 3-column block in diagonal, but there appear to be redundant rows in each process.

I guess what I want to say is, I need a synchronized output, a result_host that has the exact solution vector from all processes.I know it says “This routine and the underlying memory transfers will run synchronously. In other words, when the call to AMGX_vector_download returns, the copy is guaranteed to have been completed” But however I print it seems the solution scatters in different host which each stores a part of the solution.

Thanks…