Training Multiple Models in one GPU in linux

my organization has a computer cluster with Linux as their OS. Each node has a single A100 GPU on it.

I want to know what are the issues of training multiple models in the same node. Basically, I follow these steps:

  1. Log in into the node. Screen or TMUX twice to instantiate two linux shells in the node.
  2. In each shell, I run a python script that uses PyTorch with GPU support to train a model.
    The models are independent and the processes don’t talk to each other at all. Each model uses 20% of the GPU memory and 27% of GPU-util as reported by running Nvidia-smi on the node.


  1. Is there any way of doing this more efficiently or is this the best way of doing it?
  2. What can I read to understand how the GPU handles this concurrent processing?
  3. How is the GPU organizing the tasks submitted by each shell? Fully parallel or sequential ?

Thanks !