Hi! I am conducting a systematic study of the optimization algorithm (specifically the probability distribution of contraction cost), and I don’t actually contract the TN. The goal of this is to understand without increasing the optimization cost for individual samples, what is the effect of massive parallelization on the quality of best contraction order.
I would like to sample about a million contraction orders, but running on a GPU cluster would be completely wasteful. Is there a way to run the optimization algorithm on a CPU only cluster, while assuming A100 specs?
Is the optimization algorithm auto detecting the memory? Since I need individual sample time and cost, I am running 1 sample per MPI rank (multiple ranks per GPU). Is it the best way? I don’t know if this affects the memory it thinks is available when slicing.
Thank you for contacting us and we apologize for the delay.
The path optimizer doesn’t run on GPU but it get info from the GPU (memory, architecture etc) and thus it requires a GPU hardware to be available.
In the cuquantum python, auto detecting memory is set by default to ensure the path found can be executed on the corresponding GPU, thus if the GPU memory was split between rank, it could affect the quality of the path. However user can change the memory limitation
NetworkOptions.memory_limit, and set their own memory limitation that could be larger than the GPU memory (even it can be set to very very large like PetaBytes).
Using the C API, user provide the memory constraint as an argument, so there is no default user has to set it.
As a summary, you need a GPU hardware available to run cuQuantum pathfinder but you can change and configure any parameter of the pathfinder optimizer.