PTX compiled with an unsupported toolchain error Running DLIB on Google Kubernetes with CUDA

johnissad · November 11, 2023, 11:01pm

I am trying to run DLib for face detection on Google Kubernetes Engine. However, I continually run into the following error.

detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")
RuntimeError: Error while calling cudaMallocHost(&data, new_size*sizeof(float)) in file /dlib/dlib/cuda/gpu_data.cpp:211. code: 222, reason: the provided PTX was compiled with an unsupported toolchain.

This would suggest that there is a mismatch between the driver and compilation toolchain. However, I am reasonably certain that the compilation toolchain and driver are indeed compatible. The Google Kubernetes Engine pod is running an NVIDIA Tesla T4 GPU with an R470 driver. I verified this is the case by checking the pod itself (ssh into the cluster).

root@worker:/usr/app# nvidia-smi
Sat Nov 11 18:17:19 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P8     8W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@resize-workers-statefulset-0:/usr/app#

To compile and run DLib, I am using an official NVIDIA docker image with CUDA 11.8. According to NVIDIA’s documentation and CUDA 12.3 Release Notes, CUDA 11.8 is indeed compatible with the 470.182.03 driver version (since it exceeds 450.80.02).

I further verified this with a super simple test Dockerfile:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

COPY simple_cuda_test.cu /simple_cuda_test.cu
RUN nvcc -o simple_cuda_test /simple_cuda_test.cu

CMD ["./simple_cuda_test"]

where the test_dlib.py file is as follows:

#include <stdio.h>

__global__ void add(int a, int b, int *c) {
    *c = a + b;
}

int main() {
    int c;
    int *dev_c;

    // Allocate memory on the GPU
    cudaMalloc((void**)&dev_c, sizeof(int));

    // Launch the add() kernel on the GPU
    add<<<1,1>>>(2, 7, dev_c);

    // Copy the result back to the host
    cudaMemcpy(&c, dev_c, sizeof(int), cudaMemcpyDeviceToHost);

    printf("2 + 7 = %d\n", c);

    // Cleanup
    cudaFree(dev_c);

    return 0;
}

This yields the following output:

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

2 + 7 = 1

I then created the following Dockerfile to test dlib’s cnn_face_detection_model_v1 model:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

# dependencies
RUN apt-get update && \
    apt-get install -y \
    --no-install-recommends --no-install-suggests \
    gcc-11 g++-11 \
    git \
    build-essential \
    cmake \
    libboost-all-dev \
    libopenblas-dev \
    liblapack-dev \
    libavdevice-dev \
    libavfilter-dev \
    libavformat-dev \
    libavcodec-dev \
    libswresample-dev \
    libswscale-dev \
    libavutil-dev \
    python3 \
    python3-venv \
    python3-dev \
    python3-distutils \
    python3-pip \
    libmagic1 \
    pkg-config && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# virtual environment
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# install dlib
RUN git clone https://github.com/davisking/dlib.git /dlib && \
    cd /dlib && \
    python3 setup.py install --clean

ENV PYTHONPATH=/usr/app \
    DEBIAN_FRONTEND=noninteractive \
    PATH="/usr/local/cuda-11.8/lib64:$PATH" \
    CUDA_HOME="/usr/local/cuda-11.8" \
    LD_LIBRARY_PATH="/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH"

# simple test files for dlib
COPY mmod_human_face_detector.dat mmod_human_face_detector.dat
COPY test_dlib.py test_dlib.py
COPY test_image.jpg test_image.jpg

CMD ["python3", "test_dlib.py"]

where the test_dlib.py file is as follows:

import dlib
import time

print("dlib version: {}".format(dlib.__version__))

# Check if Dlib was compiled with CUDA support
if dlib.DLIB_USE_CUDA:
    print("Dlib was compiled with CUDA support.")
else:
    print("Dlib was NOT compiled with CUDA support.")

# Check if CUDA is currently available
if dlib.cuda.get_num_devices() > 0:
    print("CUDA is available. Number of CUDA devices:", dlib.cuda.get_num_devices())
else:
    print("CUDA is not available.")

detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")

# Load the image
image_path = "test_image.jpg"
image = dlib.load_rgb_image(image_path)

start = time.time()
dets = detector(image, 1)
end = time.time()
print("detection time: {}".format(end - start))

print("Number of faces detected: {}".format(len(dets)))

Running this Dockerfile on the pod yields the following output:

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Traceback (most recent call last):
  File "//test_dlib.py", line 18, in <module>
dlib version: 19.24.99
Dlib was compiled with CUDA support.
CUDA is available. Number of CUDA devices: 1
    detector = dlib.cnn_face_detection_model_v1("mmod_human_face_detector.dat")
RuntimeError: Error while calling cudaMallocHost(&data, new_size*sizeof(float)) in file /dlib/dlib/cuda/gpu_data.cpp:211. code: 222, reason: the provided PTX was compiled with an unsupported toolchain.

Any ideas on what the issue could be?

(As a side note, I heavily prefer using CUDA 11.8. I’ve tried downgrading to CUDA 11.4 but this introduces a host of other dependency issues and complications with the python application I’m running.)

Robert_Crovella · November 13, 2023, 8:50pm

about 99.99999999999999999999999999% of the time, this error means that you should update your GPU driver to the latest available for your GPU.

attempting to use a driver that advertises CUDA 11.4 support with a CUDA 11.8 toolchain is another good indication that you should update your GPU driver to the latest available for your GPU.

I won’t be able to comment on whether your R470 driver should work with CUDA 11.8. Do as you wish, of course.

takeofuture · July 7, 2024, 5:24pm

I have the same error with the latest driver,
Do you mean downgrade instread of upgrade GPU driver CUDA toolkit etc?
It might be dlib latest GPU driver suppoted issue

about 99.99999999999999999999999999% of the time, this error means that you should update your GPU

Robert_Crovella · July 7, 2024, 11:24pm

no, I mean move the latest driver. upgrade GPU driver, not downgrade.

This advice generally works for a “recent” GPU. Older GPUs, such as Kepler family GPUs (currently) are “stuck” on an older driver, due to the “sunsetting” of these GPUs (dropping of support), and so “the latest driver” will indeed not solve the problem. In that case the only solution I can think of would be to rearchitect the toolchain, or switch to a newer GPU.

Topic		Replies	Views
CUDA error : the provided PTX was compiled with an unsupported toolchain CUDA Setup and Installation	5	1745	May 13, 2024
Install cuda and cudnn for GeForce GT 730 GK208 B1 CUDA Setup and Installation	1	4308	February 24, 2022
CUDA Error: the provided PTX was compiled with an unsupported toolchain CUDA Programming and Performance cuda , ubuntu	3	3499	January 20, 2022
CUDA support for DLib on Jetson Nano Jetson Nano cuda	4	475	January 15, 2024
Forward compatibility was attempted on non supported HW TAO Toolkit cuda	6	15823	March 9, 2022
It says: DLIB WILL USE CUDA but it's not CUDA Developer Tools	0	1935	March 21, 2020
RuntimeError: Error while calling cudaGetDevice(&the_device_id) CUDA Programming and Performance	0	387	May 28, 2024
"provided PTX was compiled with an unsupported toolchain" error using CUB CUDA Programming and Performance	3	49328	October 12, 2021
Issue installing dlib on Jetson AGX Xavier Jetson AGX Xavier	8	2678	October 18, 2021
Jetson Nano, Build dlib in dockerfile Jetson Nano cuda , docker	2	915	January 18, 2023

PTX compiled with an unsupported toolchain error Running DLIB on Google Kubernetes with CUDA

Related topics