cudaError is 2 Help

Environment: win10, two A4000s, each with 16GB.
Two processes, A and B, are started sequentially. Both processes call cudaSetDevice(0). Process A, after starting, calls cudaMalloc to allocate 12GB of device memory and then waits. Subsequently, another process B is started. After starting, process B also attempts to allocate 12GB of device memory. At this point, GPU0 reaches its limit, and at the same time, the host memory increases. Then process B waits for 3 minutes and then loops 100 times, each time executing cudaMalloc(2MB), kernel, and cudaFree. After that, process B exits. At this time, GPU0 shows that only 3.5GB of device memory is occupied. Then process A executes cudamalloc several times and returns an error 2, causing process A to malfunction.
Has anyone encountered this? Looking forward to a reply.

WDDM or TCC driver?

WDDM mode, TCC mode cannot overapply