Pytorch1.0, Cuda9.0, cudnn7.4, failed with 'cublas runtime error'

Ran.00 · October 8, 2022, 7:10am

Excuse me, I encountered with a problem when I tried to run DenseFusion-Pytorch1.0.

And failed with:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
/home/lr/anaconda3/envs/df2/lib/python3.6/site-packages/torch/nn/functional.py:2351: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/lr/anaconda3/envs/df2/lib/python3.6/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/lr/anaconda3/envs/df2/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:129: UserWarning: nn.Upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.{} is deprecated. Use nn.functional.interpolate instead.".format(self.name))
/home/lr/anaconda3/envs/df2/lib/python3.6/site-packages/torch/nn/modules/container.py:92: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  input = module(input)
Traceback (most recent call last):
  File "./tools/train.py", line 237, in <module>
    main()
  File "./tools/train.py", line 140, in main
    loss, dis, new_points, new_target = criterion(pred_r, pred_t, pred_c, target, model_points, idx, points, opt.w, opt.refine_start)
  File "/home/lr/anaconda3/envs/df2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/lr/OLDFACE BRO/DenseFusion-Pytorch-1.0/lib/loss.py", line 83, in forward
    return loss_calculation(pred_r, pred_t, pred_c, target, model_points, idx, points, w, refine, self.num_pt_mesh, self.sym_list)
  File "/media/lr/OLDFACE BRO/DenseFusion-Pytorch-1.0/lib/loss.py", line 38, in loss_calculation
    pred = torch.add(torch.bmm(model_points, base), points + pred_t)
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCBlas.cu:441

nvcc  -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 1
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

nvidia-smi
Sat Oct  8 15:05:50 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 36%   32C    P8    11W / 125W |      5MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1299      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

How can I solve ‘RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCBlas.cu:441’?

Please help me out…
Thank you guys…

Ran.00 · October 8, 2022, 7:23am

GPU: 1660 Super CPU:i9 10900k

Topic		Replies	Views
Getting “CUDA_ERROR_INVALID_VALUE: invalid argument” in python with Tensorflow 1.14 cuDNN cuda	3	2467	April 29, 2020
Getting error, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running a basic RNN model TensorRT pytorch	3	19344	April 17, 2023
Cpp pytorch inference OpenGL tensorrt , cuda , tensorflow , nvbugs	8	1323	June 27, 2023
which version of cuda can work with RTX 2080 CUDA Setup and Installation	17	35290	May 13, 2021
Both PyTorch and TensorFlow cannot detect 3090Ti GPUs CUDA Setup and Installation cuda , tensorflow , pytorch	2	1640	January 20, 2023
NOT work with NVIDIA Quadro P620 + nvidia driver 4.70 + CUDA11.3 + Ubuntu 20.04 LTS + GCC9.3 Linux cuda	8	3739	January 12, 2022
CUDA error while running .cuda() function CUDA Setup and Installation	0	1080	July 15, 2019
ERROR: cudnn failure (CUDNN_STATUS_EXECUTION_FAILED) in mnistCUDNN.cpp:625 cuDNN cudnn	4	6348	February 23, 2021
Errors in setting up GPU with CUDA and cuDNN CUDA Setup and Installation	0	2302	November 7, 2021
Issue with Cuda 8 and python on Linux (Ubuntu 16.04.03, kernel 13.1) Linux	5	2537	October 14, 2021

Pytorch1.0, Cuda9.0, cudnn7.4, failed with 'cublas runtime error'

Related topics