Can I divide the GPU memory to different processes (python)?

This problem is in this situation. If several users share one GPU, then I have a process that dynamically allocates the memory. When my process releases the memory, Another process occupied the most memory of GPU, so my process got crashed.

A solution is that in my process I create some big variable and use them as a buffer. I cannot allocate the memory statically out of some reasons.

What is the CUDA question exactly?!?!

Sorry for confusing. The question is: Are there any API can divide the GPU memory to a process, and even the process does not allocate so much, this memory would no be used by other processes.

If I understood the question, then I’m afraid it is not really CUDA-specific:

1 - A program will only use the memory it requests, not more;
2 - A program can be given the amount of memory it wants, or NOT (it is up to the OS to decide);
3 - You can code your program to make it allocate more memory than it needs, denying resources to other programs (which is a dodgy way to do it);
4 - In CUDA, a program can allocate memory if it is reported as free. See cudaMemGetInfo function, trying to allocate more than what is free is not OK, but less is OK.

Now, regarding point #2, you can set limits in administration level. In Linux, one can define the maximum amount of shared memory that can be requested by a program. The same can be done with interpreted languages, such as Java, setting limits in the virtual machine. But now it is Python territory, so I think you will get a better answer at the pyCUDA forums and see if/how this limit can be implemented in administrative level.

Thank you for your help. I took the 3rd way before.
I am thinking about doing this outside of a process, which means to divide a block of GPU memory for a single process.