What happening when two users are sending a job to V100 and A100 GPU?

nokanaran · May 10, 2023, 4:06pm

How do V100 and A100 GPUs handle two users simultaneously sending jobs without any special partitioning?

Is resource sharing involved, and if so, which types of resources are shared - computing or memory?

Robert_Crovella · May 10, 2023, 7:21pm

The memory is shared. If user A allocates 30 GB on a 32 GB GPU, then user B won’t be able to allocate more than 2 GB (or less), otherwise a CUDA error will be reported (out of memory).

The computational resources are time-sliced. The details of the time slicing are not published nor controllable. For both V100 and A100, kernels from user A will be allowed to run for a period of time, then they will be halted (if not completed) and kernels from user B will be allowed to run for a period of time. Then the A/B time slicing will continue, until all kernels are finished.