Hello everyone,
I’m writing a face recognition program that runs on Jetson Nano. In the first code prototype, I used directly dlib to detect the faces, then used its wrapper library called https://github.com/ageitgey/face_recognition for the recognition part. Everything goes smoothly.
But when I tried using another method, called LFFD, https://github.com/YonghaoHe/A-Light-and-Fast-Face-Detector-for-Edge-Devices/tree/master/face_detection/deploy_tensorrt for detection, and passed the bbox arguments to face_recognition.face_encodings() with the right format the function needs, an weird error occured:
Traceback (most recent call last):
File "predict_tensorrt_video.py", line 673, in <module>
main()
File "predict_tensorrt_video.py", line 88, in inner
retval = fnc(*args, **kwargs)
File "predict_tensorrt_video.py", line 667, in main
run_inference(args.video_in, args.video_out, candidate_id, current_time)
File "predict_tensorrt_video.py", line 565, in run_inference
face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in face_encodings
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in <listcomp>
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /home/gate/dlib-19.17/dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed
Segmentation fault (core dumped)
although the detection results are very good (I drew bboxes to frames, wrote to output video and check with eyes).
That’s what I get when doing detection on original-size video frames, when I cropped it with a preset smaller region, the error prompted is different:
Traceback (most recent call last):
File "predict_tensorrt_video.py", line 672, in <module>
main()
File "predict_tensorrt_video.py", line 88, in inner
retval = fnc(*args, **kwargs)
File "predict_tensorrt_video.py", line 666, in main
run_inference(args.video_in, args.video_out, candidate_id, current_time)
File "predict_tensorrt_video.py", line 564, in run_inference
face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in face_encodings
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in <listcomp>
return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /home/gate/dlib-19.17/dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed
cudaStreamDestroy() failed. Reason: invalid device ordinal
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaStreamDestroy() failed. Reason: invalid device ordinal
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaFree() failed. Reason: invalid device pointer
Segmentation fault (core dumped)
(some “cuda” lines above “Segmentation fault (core dumped)”).
I’ve monitored the memory with jtop when the program started to run till it ended. And memory never reached to the maximum, it only consumed around 2.5GB/4GB. The GPU reached to the maximum 99% and turned back to 0% periodically.
I’ve tried to re-check my code dozens of time, here’s my code: https://paste.ofcode.org/3avUP9WVJ4HsdT96ndxD25p, built the newest dlib 19.18 and tried again, but nothing works. Anyone please help me with this …
Here’s my specs:
-
NVIDIA Jetson NANO/TX1
- Jetpack 4.2.1 [L4T 32.2.0]
- CUDA GPU architecture 5.3
-
Libraries:
- CUDA 10.0.326
- cuDNN 7.5.0.56-1+cuda10.0
- TensorRT 5.1.6.1-1+cuda10.0
- Visionworks 1.6.0.500n
- OpenCV 4.1.0 compiled CUDA: YES
-
Jetson Performance: inactive
-
dlib version used: 19.17, 19.18 (manually compiled to avoid the problem https://devtalk.nvidia.com/default/topic/1049660/jetson-nano/issues-with-dlib-library/2#reply)