Splitting huge dataset across multiple GPUs and communicating


I have a simulation with a huge 3D dataset which I cannot fit into a single GPU. I have a machine with 4 GPUs which I want to work together. I split the dataset into 4 sub-cubes and want only a single sub-cube to be allocated on each device. For each simulation step I have to communicate ghost layers between the devices. What is the best way to do this?

  • Creating 1 context with 4 GPUs? Will all GPUs get all sub-cubes allocated, since buffer creation is for a context and not device?
  • Creating 4 contexts with 1 GPU? It is possible to synchronize between contexts?

What commands should I use to transfer data directly from one GPU to another?