Panic: Could not run 'torchvision::roi_pool' with arguments from the 'CUDA' backend

Hi, Team!

I have such errors, when I run my code in docker container:

root@de87551d73cf:/app/src# ./main 
WARN[0032] CUDA is valid                                
WARN[0044] CUDA is valid                                
INFO[0044] Forwarding...                                
[W TensorImpl.h:1156] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
panic: Could not run 'torchvision::roi_pool' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::roi_pool' is only available for these backends: [CPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /app/vision/torchvision/csrc/ops/cpu/roi_pool_kernel.cpp:239 [kernel]
BackendSelect: fallthrough registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
ADInplaceOrView: fallthrough registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradCPU: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradCUDA: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradXLA: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradMLC: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradHPU: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradNestedTensor: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradPrivateUse1: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradPrivateUse2: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
AutogradPrivateUse3: registered at /app/vision/torchvision/csrc/ops/autograd/roi_pool_kernel.cpp:142 [autograd kernel]
Tracer: fallthrough registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/torch/csrc/jit/frontend/tracer.cpp:1036 [backend fallback]
Autocast: fallthrough registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/autocast_mode.cpp:255 [backend fallback]
Batched: registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/BatchingRegistrations.cpp:1019 [backend fallback]
VmapMode: fallthrough registered at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Exception raised from reportError at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/core/dispatch/OperatorEntry.cpp:399 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xa0 (0x7fa3d3f300 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: c10::impl::OperatorEntry::reportError(c10::DispatchKey) const + 0x658 (0x7f9fb77098 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10::impl::OperatorEntry::lookup(c10::DispatchKey) const + 0x78 (0x7f8913eee8 in /usr/local/lib/libtorchvision.so)
frame #3: vision::ops::roi_pool(at::Tensor const&, at::Tensor const&, double, long, long) + 0x424 (0x7f89158dd8 in /usr/local/lib/libtorchvision.so)
frame #4: <unknown function> + 0xf1acc (0x7f89108acc in /usr/local/lib/libtorchvision.so)
frame #5: <unknown function> + 0xf2fd4 (0x7f89109fd4 in /usr/local/lib/libtorchvision.so)
frame #6: <unknown function> + 0xf2574 (0x7f89109574 in /usr/local/lib/libtorchvision.so)
frame #7: <unknown function> + 0xf5064 (0x7f8910c064 in /usr/local/lib/libtorchvision.so)
frame #8: std::tuple<at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, double, long, long>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, double&&, long&&, long&&) + 0xb8 (0x7f891517b8 in /usr/local/lib/libtorchvision.so)
frame #9: vision::ops::roi_pool(at::Tensor const&, at::Tensor const&, double, long, long) + 0x5bc (0x7f89158f70 in /usr/local/lib/libtorchvision.so)
frame #10: roi_pool + 0x5c (0x7fb0e36c9c in /app/src/internal/vision/cvision/libcvision.so)
frame #11: _cgo_a7807f38a86b_Cfunc_roi_pool + 0x30 (0x641d00 in ./main)


goroutine 1 [running, locked to thread]:
github.com/wangkuiyi/gotorch.MustNil(0x7cda0780)
	/root/go/pkg/mod/github.com/mishabeliy15/gotorch@v0.0.0-20230905115441-e54e873665cb/tensor.go:26 +0x64
RnD_Jetson_optimization/internal/vision.RoiPool({0x6a951b?}, {0x3?}, 0x0?, 0x0?, 0x400006d718?)
	/app/src/internal/vision/ops.go:17 +0x5c
RnD_Jetson_optimization/internal/matching.roiPool(...)
	/app/src/internal/matching/loses.go:55
RnD_Jetson_optimization/internal/matching.extractPatches({0x0?}, {0x1?}, 0x4?)
	/app/src/internal/matching/loses.go:49 +0xd4
RnD_Jetson_optimization/internal/matching.predSoftArgmax({0x400006d7b8?}, {0x5f2570?}, 0x8, {0x400006d7c8?})
	/app/src/internal/matching/superPointNet.go:104 +0x44
RnD_Jetson_optimization/internal/matching.(*SuperPointNet).PostProcess(0x40007f60f0, {0x4000124b68?}, {0x4002a020a0})
	/app/src/internal/matching/superPointNet.go:56 +0xe4
RnD_Jetson_optimization/internal/matching.(*SuperPointNet).Predict(0x66a300?, {0x40007f61b0?})
	/app/src/internal/matching/superPointNet.go:64 +0x30
RnD_Jetson_optimization/internal/matching.(*BruteForceMatching).Forward(0x40007f6120, 0x40007f61b0?)
	/app/src/internal/matching/bruteForceMatching.go:58 +0x15c
RnD_Jetson_optimization/internal/modelLoader.Warmup({0x6fa4f8, 0x400033d570}, {0x6faa28, 0x40007f6120})
	/app/src/internal/modelLoader/modelLoader.go:62 +0x3c0
RnD_Jetson_optimization/internal/location.(*Location).WarmupModel(...)
	/app/src/internal/location/location.go:74
RnD_Jetson_optimization/internal/location.(*Location).LoadArtifacts(0x40002f2370)
	/app/src/internal/location/location.go:104 +0x94
RnD_Jetson_optimization/app.Run()
	/app/src/app/app.go:31 +0x148
main.main()
	/app/src/cmd/main.go:10 +0x1c
root@de87551d73cf:/app/src# 

Current docker file, which I use at present time:

ARG BASE_IMAGE=nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3

FROM $BASE_IMAGE as base

RUN apt-get update && apt-get upgrade -yy && apt-get install -yy \
    sudo \
    wget \
    unzip \
    git \
    build-essential \
    git \
    clang-3.8 \
    libpng-dev \
    libopencv-dev \
    libgeos-dev \
    libproj-dev && \
    apt remove cmake -y && \
    pip3 install --upgrade pip && \
    pip3 install cmake --upgrade && \
    apt autoremove -y

# Download and install sqlite3
RUN apt-get update && apt-get install -y sqlite3 libsqlite3-dev && \
    apt-get install libcurl4-openssl-dev -y && \	
    sqlite3 --version && \
 #   mkdir build && cd build && \
 #   cmake .. && \
    # Check sqlite3
    which sqlite3  && \
    apt autoremove -y

# Downalod and install PROJ
RUN apt-get update && apt-get install -y && \
    mkdir proj && cd proj && \
    wget https://download.osgeo.org/proj/proj-9.3.0.tar.gz && \
    tar -xzf proj-9.3.0.tar.gz && \
    cd proj-9.3.0 && \
    mkdir build && cd build && \
    cmake .. && \
    make && \
    make install && \
    which proj

# Download and install GDAL
RUN apt-get update && apt-get install -y && \
    mkdir downloads && cd downloads && wget https://download.osgeo.org/gdal/CURRENT/gdal-3.8.1.tar.gz && \
    tar -xzf gdal-3.8.1.tar.gz && \
    cd gdal-3.8.1 && \
    mkdir build && cd build && \
    cmake .. && \
    make && \
    make install && \
    which gdalinfo && \
    apt autoremove -y


# Download and install GDAL, GEOS, PROJ, SpatiaLite
# RUN	./scripts/install/install_deps.sh -y

# Donwload and install Go
ENV GOLANG_VERSION=1.21.0 \
    GOROOT=/usr/local/go \
    PATH=$PATH:/usr/local/go/bin

RUN wget https://golang.org/dl/go${GOLANG_VERSION}.linux-arm64.tar.gz && \
    tar -C /usr/local -xzf go${GOLANG_VERSION}.linux-arm64.tar.gz && \
    rm go${GOLANG_VERSION}.linux-arm64.tar.gz

# Set ENV var for libtorch
ARG PYTHON_VERSION=3.6
ENV LIBTORCH_PATH=/usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LIBTORCH_PATH/lib:/usr/local/lib:/app/src/libs \
    Torch_DIR=$LIBTORCH_PATH

# Create symlinks libtorch to system
RUN ln -s ${LIBTORCH_PATH}/include/* /usr/local/include/ && \
    ln -s ${LIBTORCH_PATH}/lib/* /usr/local/lib/


WORKDIR /app

ARG BUILD_TORCH
ARG BUILD_CORES=2

# Install OpenCV 4.7.0
COPY scripts scripts
COPY openCV openCV
RUN chmod +x ./scripts/install/install-openCV.sh && \
    ./scripts/install/install-openCV.sh

# Download and build libtorchvision
RUN git clone https://github.com/pytorch/vision.git --branch v0.15.2 &&  \
    if [[ -z "$BUILD_TORCH" ]] ; then cd vision ; mkdir -p build ; cd build; cmake .. ; make -j${BUILD_CORES} ; make install ; cd .. ; rm -rf build; else echo "Build torch is skipped" ; fi

COPY go.mod go.sum ./
RUN go mod download

# Build gotorch
RUN cd /root/go/pkg/mod/github.com/mishabeliy15/gotorch*/cgotorch && \
    chmod +x build.sh && \
    if [[ -z "$BUILD_TORCH" ]] ; then ./build.sh -j${BUILD_CORES} ; else echo "Build torch is skipped" ; fi


FROM base as dev

WORKDIR /app/src
CMD ["bash"]

Version of CUDA: 10.2

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0

Version of torch:

pip show torch
Name: torch
Version: 1.9.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib/python3.6/dist-packages
Requires: dataclasses, typing-extensions
Required-by: torchaudio, torchvision

Board: Jentson Nano

 python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (aarch64)
GCC version: (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.27.7
Libc version: glibc-2.25

Python version: 3.6 (64-bit runtime)
Python platform: Linux-4.9.253-tegra-aarch64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 10.2.300
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0
[pip3] torchaudio==0.9.0a0+33b2469
[pip3] torchvision==0.10.0a0+300a8a4
[conda] Could not collect

Hi,

This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build).

Based on this roi_pool doesn’t support CUDA backend.
Could you check with the PyTorch team to see if they have added the support in the recent release?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.