Python OpenCV - multiprocessing doesn't work with CUDA

Kaczor · June 8, 2021, 3:50pm

Hello,

I am trying to run CUDA ORB key-point detection with multiple GPUs. The principle of work is to split list of video frames between available GPU devices (load them into GPU memory). However when I run it with multiple threads by threading , i observe that each GPU slows down - I suppose that it is caused by communication between multiple GPUs and single process on which all threads run. Because of that I tried the same using multiprocessing instead threading to exploit multiple CPU cores (to sign different cores with different GPU) but it gave me an error. Below I attach my test code:

import cv2
from threading import Thread
from multiprocessing import Process
from tqdm import tqdm

def cuda_test(gpu_id, idx_start, idx_end, frames):
    cv2.cuda.setDevice(gpu_id)
    cuda_orb = cv2.cuda.ORB_create()
    for i in tqdm(range(idx_start,idx_end)):
        gray_frame = cv2.cuda.cvtColor(frames[i],cv2.COLOR_BGR2GRAY)
        kp,ds = cuda_orb.detectAndComputeAsync(gray_frame,None)


if __name__ == '__main__':
    img = cv2.imread('1.png')
    frames = [cv2.cuda_GpuMat(img) for x in range(1500)]
    
    print('\nMultihreading part: ')
    t1 = Thread(target=cuda_test,args=(0,0,len(frames),frames))
    t1.start()
    t1.join()
    
    print('\nMultiprocessing part: ')
    p1 = Process(target=cuda_test,args=(0,0,len(frames),frames))
    p1.start()
    p1.join()

Above’s sample code run only on single thread/process because currently I don’t have access to machine with multiple GPU. When threading part runs successfully, multiprocessing doesn’t work and gives mi this error:

Multihreading part: 
100%|██████████████████████████████████████| 1500/1500 [00:08<00:00, 166.72it/s]

Multiprocessing part: 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "multi_test.py", line 419, in cuda_test
    cv2.cuda.setDevice(gpu_id)
cv2.error: OpenCV(4.5.1) /home/kaczor/opencv/modules/core/src/cuda_info.cpp:73: error: (-217:Gpu API call) initialization error in function 'setDevice'

I see that there is problem with OpenCV CUDA support for multtiple processes done by multiprocesing . However, I am not able to find a reason why exactly it happens and how to fix it… Does anyone has any idea how to efficiently split task like this between multiple GPU to avoid situation like with multiple threads from threading where each GPU slows down?

Robert_Crovella · June 12, 2021, 4:48pm

The reason why exactly it happens is because it is not possible to use CUDA in a child process created by fork(), if CUDA has been initialized in the parent process.

So the first step in fixing the problem might be to get any usage of cv2.cuda out of main, before creating the child processes. However that might not be sufficient if import of cv2.cuda (by itself) initializes CUDA. It is possible to use CUDA in python multiprocessing, but I don’t happen to know if it is possible with cv2.cuda (The previous link suggests to me it is possible with the non-CUDA-built OpenCV.)

Note that “getting rid of CUDA initialization in main” probably also includes the removal of your multithreading test, prior to the multiprocessing test. Here’s a simplistic example:

$ cat t68.py
from threading import Thread
from multiprocessing import Process
from numba import cuda

def cuda_test(gpu_id, idx_start, idx_end, frames):
    cuda.select_device(gpu_id)


if __name__ == '__main__':

#    print('\nMultihreading part: ')
#    t1 = Thread(target=cuda_test,args=(0,0,1,0))
#    t1.start()
#    t1.join()

    print('\nMultiprocessing part: ')
    p1 = Process(target=cuda_test,args=(0,0,1,0))
    p1.start()
    p1.join()
$ python t68.py

Multiprocessing part:
$

My version of numba is nicely explicit. If I uncomment the multithreading part of the test case above, I get an error message like this:

numba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking

Kaczor · June 15, 2021, 9:05am

Thank you for the explanation.

Topic		Replies	Views
Opencv crash in CUDA code with multi-threads Other Tools opencv	0	3288	July 14, 2014
Simple multiGPU - Why is it failed Example to understand how multiGPU work CUDA Programming and Performance	8	4345	March 6, 2008
memcopy fails in multiple pthreads with cudaSetDevice() i m unable to use pthread with multiple GPUs CUDA Programming and Performance	5	3282	August 8, 2011
CPU threads and CUDA CUDA Programming and Performance	8	7237	January 15, 2018
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3464	March 10, 2011
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3024	November 13, 2007
Host->Device memcpy failure in forked process valgrind output included CUDA Programming and Performance	4	7129	February 26, 2008
multi GPUs programming error CUDA Programming and Performance	4	891	June 4, 2013
Problem using CUDA Visual Profiler for CUDA Fortran Legacy PGI Compilers	1	5092	August 6, 2012
Multi-GPU - Some questions CUDA Programming and Performance	10	10741	January 21, 2010

Python OpenCV - multiprocessing doesn't work with CUDA

Related topics