There are ways through which GPU sharing can be achieved. I came across occupancy. Can I use it to slice the GPU among the processes (e.g. tensorflow) which are sharing GPU? slice here means GPU resources are always dedicated to that process. Using occupancy I will get know GPU & SMs details and based on that I launch kernel stating that create blocks for these GPU resources.
Please suggest. Thanks!
Your question was already answered correctly here:
No, occupancy doesn’t have to do with slicing a GPU for separate processes. In general, GPUs don’t easily “slice” to support separate processes. Good advice is to use multiple GPUs. Alternatively you could investigate CUDA MPS if you wish (just google: CUDA MPS). I wouldn’t recommend that for usage with Tensorflow, however.