I am trying to run DLib for face detection on Google Kubernetes Engine. However, I continually run into the following error.
detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")
RuntimeError: Error while calling cudaMallocHost(&data, new_size*sizeof(float)) in file /dlib/dlib/cuda/gpu_data.cpp:211. code: 222, reason: the provided PTX was compiled with an unsupported toolchain.
This would suggest that there is a mismatch between the driver and compilation toolchain. However, I am reasonably certain that the compilation toolchain and driver are indeed compatible. The Google Kubernetes Engine pod is running an NVIDIA Tesla T4 GPU with an R470 driver. I verified this is the case by checking the pod itself (ssh into the cluster).
root@worker:/usr/app# nvidia-smi
Sat Nov 11 18:17:19 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 37C P8 8W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@resize-workers-statefulset-0:/usr/app#
To compile and run DLib, I am using an official NVIDIA docker image with CUDA 11.8. According to NVIDIA’s documentation and CUDA 12.3 Release Notes, CUDA 11.8 is indeed compatible with the 470.182.03 driver version (since it exceeds 450.80.02).
I further verified this with a super simple test Dockerfile:
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
COPY simple_cuda_test.cu /simple_cuda_test.cu
RUN nvcc -o simple_cuda_test /simple_cuda_test.cu
CMD ["./simple_cuda_test"]
where the test_dlib.py file is as follows:
#include <stdio.h>
__global__ void add(int a, int b, int *c) {
*c = a + b;
}
int main() {
int c;
int *dev_c;
// Allocate memory on the GPU
cudaMalloc((void**)&dev_c, sizeof(int));
// Launch the add() kernel on the GPU
add<<<1,1>>>(2, 7, dev_c);
// Copy the result back to the host
cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);
printf("2 + 7 = %d\n", c);
// Cleanup
cudaFree(dev_c);
return 0;
}
This yields the following output:
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2 + 7 = 1
I then created the following Dockerfile to test dlib’s cnn_face_detection_model_v1 model:
FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive
# dependencies
RUN apt-get update && \
apt-get install -y \
--no-install-recommends --no-install-suggests \
gcc-11 g++-11 \
git \
build-essential \
cmake \
libboost-all-dev \
libopenblas-dev \
liblapack-dev \
libavdevice-dev \
libavfilter-dev \
libavformat-dev \
libavcodec-dev \
libswresample-dev \
libswscale-dev \
libavutil-dev \
python3 \
python3-venv \
python3-dev \
python3-distutils \
python3-pip \
libmagic1 \
pkg-config && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# virtual environment
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# install dlib
RUN git clone https://github.com/davisking/dlib.git /dlib && \
cd /dlib && \
python3 setup.py install --clean
ENV PYTHONPATH=/usr/app \
DEBIAN_FRONTEND=noninteractive \
PATH="/usr/local/cuda-11.8/lib64:$PATH" \
CUDA_HOME="/usr/local/cuda-11.8" \
LD_LIBRARY_PATH="/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH"
# simple test files for dlib
COPY mmod_human_face_detector.dat mmod_human_face_detector.dat
COPY test_dlib.py test_dlib.py
COPY test_image.jpg test_image.jpg
CMD ["python3", "test_dlib.py"]
where the test_dlib.py file is as follows:
import dlib
import time
print("dlib version: {}".format(dlib.__version__))
# Check if Dlib was compiled with CUDA support
if dlib.DLIB_USE_CUDA:
print("Dlib was compiled with CUDA support.")
else:
print("Dlib was NOT compiled with CUDA support.")
# Check if CUDA is currently available
if dlib.cuda.get_num_devices() > 0:
print("CUDA is available. Number of CUDA devices:", dlib.cuda.get_num_devices())
else:
print("CUDA is not available.")
detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")
# Load the image
image_path = "test_image.jpg"
image = dlib.load_rgb_image(image_path)
start = time.time()
dets = detector(image, 1)
end = time.time()
print("detection time: {}".format(end - start))
print("Number of faces detected: {}".format(len(dets)))
Running this Dockerfile on the pod yields the following output:
==========
== CUDA ==
==========
CUDA Version 11.8.0
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Traceback (most recent call last):
File "//test_dlib.py", line 18, in <module>
dlib version: 19.24.99
Dlib was compiled with CUDA support.
CUDA is available. Number of CUDA devices: 1
detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")
RuntimeError: Error while calling cudaMallocHost(&data, new_size*sizeof(float)) in file /dlib/dlib/cuda/gpu_data.cpp:211. code: 222, reason: the provided PTX was compiled with an unsupported toolchain.
Any ideas on what the issue could be?
(As a side note, I heavily prefer using CUDA 11.8. I’ve tried downgrading to CUDA 11.4 but this introduces a host of other dependency issues and complications with the python application I’m running.)