I am trying to run CUDA ORB key-point detection with multiple GPUs. The principle of work is to split list of video frames between available GPU devices (load them into GPU memory). However when I run it with multiple threads by
threading , i observe that each GPU slows down - I suppose that it is caused by communication between multiple GPUs and single process on which all threads run. Because of that I tried the same using
threading to exploit multiple CPU cores (to sign different cores with different GPU) but it gave me an error. Below I attach my test code:
import cv2 from threading import Thread from multiprocessing import Process from tqdm import tqdm def cuda_test(gpu_id, idx_start, idx_end, frames): cv2.cuda.setDevice(gpu_id) cuda_orb = cv2.cuda.ORB_create() for i in tqdm(range(idx_start,idx_end)): gray_frame = cv2.cuda.cvtColor(frames[i],cv2.COLOR_BGR2GRAY) kp,ds = cuda_orb.detectAndComputeAsync(gray_frame,None) if __name__ == '__main__': img = cv2.imread('1.png') frames = [cv2.cuda_GpuMat(img) for x in range(1500)] print('\nMultihreading part: ') t1 = Thread(target=cuda_test,args=(0,0,len(frames),frames)) t1.start() t1.join() print('\nMultiprocessing part: ') p1 = Process(target=cuda_test,args=(0,0,len(frames),frames)) p1.start() p1.join()
Above’s sample code run only on single thread/process because currently I don’t have access to machine with multiple GPU. When
threading part runs successfully,
multiprocessing doesn’t work and gives mi this error:
Multihreading part: 100%|██████████████████████████████████████| 1500/1500 [00:08<00:00, 166.72it/s] Multiprocessing part: Process Process-1: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "multi_test.py", line 419, in cuda_test cv2.cuda.setDevice(gpu_id) cv2.error: OpenCV(4.5.1) /home/kaczor/opencv/modules/core/src/cuda_info.cpp:73: error: (-217:Gpu API call) initialization error in function 'setDevice'
I see that there is problem with OpenCV CUDA support for multtiple processes done by
multiprocesing . However, I am not able to find a reason why exactly it happens and how to fix it… Does anyone has any idea how to efficiently split task like this between multiple GPU to avoid situation like with multiple threads from
threading where each GPU slows down?