What happening when two users are sending a job to V100 and A100 GPU?

How do V100 and A100 GPUs handle two users simultaneously sending jobs without any special partitioning?

Is resource sharing involved, and if so, which types of resources are shared - computing or memory?

The memory is shared. If user A allocates 30 GB on a 32 GB GPU, then user B won’t be able to allocate more than 2 GB (or less), otherwise a CUDA error will be reported (out of memory).

The computational resources are time-sliced. The details of the time slicing are not published nor controllable. For both V100 and A100, kernels from user A will be allowed to run for a period of time, then they will be halted (if not completed) and kernels from user B will be allowed to run for a period of time. Then the A/B time slicing will continue, until all kernels are finished.