How to use Cuda Cores in a GPU machine for Computer vision application application efficiently


I am running a customized object detection (yolov5/8) model, tracking, and some other post processing etc as a single AI solution on a GPU machine (RTX 2060) and it consumes about 1.5GB of graphics for one video stream (RTSP). Since, RTX 2060 has 6GB of VRAM so a maximum of 3 to 4 video streams simultaneously can be processed. Now my need is that my GPU machine should be able to handle (Object detection, tracking, other business logic) more than 20 stream using available 1920 cuda cores. Is it possible? If a single AI solution contains yolov8, tracking, other business logic, then how many maximum such AI solution can run assuming each AI solution will take one RTSP stream as input feed. If more than 4 is the answer, then how to achieve this? Is there any sample code available? If I am missing something then please clear my understanding.

Any expert opnion on this?