Hi there,
I have a nvidia Tesla T4 GPU deployed with 2vCPU/32GB memory on GCP. GPU is in DEFAULT mode (which is MPS mode).
Memory of GPU is 16 GB. Currently held up memory is 4500 MB by a gunicorn process.
I activated 2 conda environments and ran django servers by “python manage.py runserver 0.0.0.0:8000” and on port 8001.
Now, since image-processing takes almost 2 GB GPU memory each, I am expecting that GPU can process them in parallel. But, GPU is processing them serially.
Is there anything we can do about it (by choosing some option of nvidia-smi or in conda)?