Hello,
I am trying to run CUDA ORB key-point detection with multiple GPUs. The principle of work is to split list of video frames between available GPU devices (load them into GPU memory). However when I run it with multiple threads by threading
, i observe that each GPU slows down - I suppose that it is caused by communication between multiple GPUs and single process on which all threads run. Because of that I tried the same using multiprocessing
instead threading
to exploit multiple CPU cores (to sign different cores with different GPU) but it gave me an error. Below I attach my test code:
import cv2
from threading import Thread
from multiprocessing import Process
from tqdm import tqdm
def cuda_test(gpu_id, idx_start, idx_end, frames):
cv2.cuda.setDevice(gpu_id)
cuda_orb = cv2.cuda.ORB_create()
for i in tqdm(range(idx_start,idx_end)):
gray_frame = cv2.cuda.cvtColor(frames[i],cv2.COLOR_BGR2GRAY)
kp,ds = cuda_orb.detectAndComputeAsync(gray_frame,None)
if __name__ == '__main__':
img = cv2.imread('1.png')
frames = [cv2.cuda_GpuMat(img) for x in range(1500)]
print('\nMultihreading part: ')
t1 = Thread(target=cuda_test,args=(0,0,len(frames),frames))
t1.start()
t1.join()
print('\nMultiprocessing part: ')
p1 = Process(target=cuda_test,args=(0,0,len(frames),frames))
p1.start()
p1.join()
Aboveβs sample code run only on single thread/process because currently I donβt have access to machine with multiple GPU. When threading
part runs successfully, multiprocessing
doesnβt work and gives mi this error:
Multihreading part:
100%|ββββββββββββββββββββββββββββββββββββββ| 1500/1500 [00:08<00:00, 166.72it/s]
Multiprocessing part:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "multi_test.py", line 419, in cuda_test
cv2.cuda.setDevice(gpu_id)
cv2.error: OpenCV(4.5.1) /home/kaczor/opencv/modules/core/src/cuda_info.cpp:73: error: (-217:Gpu API call) initialization error in function 'setDevice'
I see that there is problem with OpenCV CUDA support for multtiple processes done by multiprocesing
. However, I am not able to find a reason why exactly it happens and how to fix it⦠Does anyone has any idea how to efficiently split task like this between multiple GPU to avoid situation like with multiple threads from threading
where each GPU slows down?