Currently i am developing PoC for analyze a few video streams with one GPU.
Software part: CUDA 10 and compatible cuDNN, EmguCV 4.0.1, YoloV3-Tiny(Darknet) and Windows 10 Enterprise (version 1903).
Hardware of first PC: i7 9700, Quadro P2000
Hardware of second PC: i7 9700, GTX 1660Ti
The problem is that on Quadro P2000 i can run easily 7-8 realtime streams with ~12 fps and networks size 416*416 and each stream loads GPU only 4%.
When i am trying to run the same streams on GTX1660Ti it can process only 2 streams in real time which loads GPU for 12-15% each. Its very sad result, because i bought this video card as a cheaper and more powerful(if compare specs of each GPU) alternative of Quadro P2000.
Note: Streams running as different instances. Not in one process.
Maybe someone can suggest what could be the problem?
Is GTX series dont support parallel computing or its just driver problem or something else?
Maybe i should change a OS or CUDA?
Workload for Quadro P2000 and GTX 1660Ti:
Thank you for any help,