CUDA error: no kernel image is available for execution on the device Error from operator: output

How can I fix CUDA error: no kernel image is available for execution on the device Error from operator?

Hi all,
I am having problems using CUDA for deep learning development.

I compiled the whole pythorch with GPU support and console output was successfully compiled. Therefore I was able to get caffe2_pybind11_state.pyd and caffe2_pybind11_state_gpu.pyd.

When I run python the following code without GPU support it succeeds:
python char_rnn.py --train_data shakespeare.txt

However, when I run it with GPU I got a CUDA error:
python char_rnn.py --train_data shakespeare.txt --gpu

My configuration below:
FIRST TRY
OS: Windows 10
PyTorch version: current version
Python version: 2.7
CUDA/cuDNN version: 9.2/7.1
GPU models and configuration: NVIDIA GeForce GTX 1050
Versions of any other relevant libraries: Visual Studio 2017

SECOND TRY
OS: Windows 10
PyTorch version: current version
Python version: 2.7
CUDA/cuDNN version: 8.0/7.0.5
GPU models and configuration: NVIDIA GeForce GTX 1050
Versions of any other relevant libraries: Visual Studio 2015

Same console output for both tries:
D:\yev\git_projects\pytorch\caffe2\python\examples>python char_rnn.py --train_data shakespeare.txt --gpu
[E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
Input has 62 characters. Total input size: 99993
DEBUG:char_rnn:Start training
DEBUG:char_rnn:Training model
WARNING:caffe2.python.workspace:Original python traceback for operator 0 in network char_rnn_init in exception above (most recent call last):
WARNING:caffe2.python.workspace: File “char_rnn.py”, line 276, in
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\utils.py”, line 329, in wrapper
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\utils.py”, line 291, in run
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\utils.py”, line 328, in func
WARNING:caffe2.python.workspace: File “char_rnn.py”, line 270, in main
WARNING:caffe2.python.workspace: File “char_rnn.py”, line 71, in CreateModel
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py”, line 1571, in _LSTM
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py”, line 93, in apply_over_sequence
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py”, line 491, in prepare_input
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\brew.py”, line 107, in scope_wrapper
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\helpers\fc.py”, line 58, in fc
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\helpers\fc.py”, line 37, in _FC_or_packed_FC
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\model_helper.py”, line 214, in create_param
WARNING:caffe2.python.workspace: File “D:\yev\git_projects\pytorch\build\caffe2\python\modeling\initializers.py”, line 30, in create_param
Entering interactive debugger. Type “bt” to print the full stacktrace. Type “help” to see command listing.
[enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator:
output: “LSTM/i2h_w” name: “” type: “XavierFill” arg { name: “shape” ints: 400 ints: 62 } device_option { device_type: 1 cuda_gpu_id: 0 }

d:\yev\git_projects\pytorch\build\caffe2\python\workspace.py(178)CallWithExceptionIntercept()
-> return func(*args, **kwargs)
(Pdb)

Thanks in advance.

when you compiled pytorch for GPU you need to specify the arch settings for your GPU

you need to set TORCH_CUDA_ARCH_LIST to “6.1” to match your GPU.

https://github.com/pytorch/pytorch/issues/6321

Hi txbob,

I did what you mentioned but I got a different error “Caffe2 building failed”. I did the following sets:
D:\Yeverino\git_projects\pytorch\scripts>set CMAKE_GENERATOR=“Visual Studio 14 2015 Win64”

D:\Yeverino\git_projects\pytorch\scripts>set USE_CUDA=ON

D:\Yeverino\git_projects\pytorch\scripts>set TORCH_CUDA_ARCH_LIST=“6.1”

D:\Yeverino\git_projects\pytorch\scripts>build_windows.bat

Below the Console Output:
Requirement already satisfied: pyyaml in c:\python27\lib\site-packages (3.13)
CAFFE2_ROOT=D:\Yeverino\git_projects\pytorch\scripts…
CMAKE_GENERATOR=“Visual Studio 14 2015 Win64”
CMAKE_BUILD_TYPE=Release
– Selecting Windows SDK version 10.0.14393.0 to target Windows 10.0.17134.
– The CXX compiler identification is MSVC 19.0.24215.1
– The C compiler identification is MSVC 19.0.24215.1
– Check for working CXX compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
– Check for working CXX compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe – works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Detecting CXX compile features
– Detecting CXX compile features - done
– Check for working C compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
– Check for working C compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe – works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Detecting C compile features
– Detecting C compile features - done
– Not forcing any particular BLAS to be found
– Performing Test CAFFE2_LONG_IS_INT32_OR_64
– Performing Test CAFFE2_LONG_IS_INT32_OR_64 - Failed
– Need to define long as a separate typeid.
– Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
– Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
– std::exception_ptr is supported.
– Performing Test CAFFE2_IS_NUMA_AVAILABLE
– Performing Test CAFFE2_IS_NUMA_AVAILABLE - Failed
– NUMA is not available
– Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
– Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
– Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
– Current compiler supports avx2 extention. Will build perfkernels.
– Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
– Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed
– Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
– Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed
– Building using own protobuf under third_party per request.
– Use custom protobuf build.
– Looking for pthread.h
– Looking for pthread.h - not found
– Found Threads: TRUE
– Caffe2 protobuf include directory: <BUILD_INTERFACE:D:/Yeverino/git_projects/pytorch/third_party/protobuf/src><INSTALL_INTERFACE:include>
– Found Git: D:/Program Files/Git/Git/cmd/git.exe (found version “2.18.0.windows.1”)
– The BLAS backend of choice:Eigen
CMake Warning at cmake/Dependencies.cmake:257 (message):
NUMA is currently only supported under Linux.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:330 (find_package):
By not providing “FindEigen3.cmake” in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by “Eigen3”, but
CMake did not find one.

Could not find a package configuration file provided by “Eigen3” with any
of the following names:

Eigen3Config.cmake
eigen3-config.cmake

Add the installation prefix of “Eigen3” to CMAKE_PREFIX_PATH or set
“Eigen3_DIR” to a directory containing one of the above files. If “Eigen3”
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

– Did not find system Eigen. Using third party subdirectory.
– Found PythonInterp: C:/Python27/python.exe (found suitable version “2.7.14”, minimum required is “2.7”)
– Found PythonLibs: C:/Python27/libs/python27.lib (found suitable version “2.7.14”, minimum required is “2.7”)
– Found NumPy: C:/Python27/lib/site-packages/numpy/core/include (found version “1.14.5”)
– NumPy ver. 1.14.5 found (include: C:/Python27/lib/site-packages/numpy/core/include)
– Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR)
– Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
– Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
– Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND)
CMake Warning at cmake/Dependencies.cmake:401 (message):
Not compiling with MPI. Suppress this warning with -DUSE_MPI=OFF
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

– Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 (found suitable version “8.0”, minimum required is “7.0”)
– Caffe2: CUDA detected: 8.0
– Caffe2: CUDA nvcc is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/bin/nvcc.exe
– Caffe2: CUDA toolkit directory: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
– Caffe2: Header version is: 8.0
– Found CUDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include
– Found cuDNN: v7.0.5 (include: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include, library: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib)
CMake Warning at cmake/public/utils.cmake:148 (message):
In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
to cmake instead of implicitly setting it as an env variable. This will
become a FATAL_ERROR in future version of pytorch.
Call Stack (most recent call first):
cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag)
cmake/Dependencies.cmake:433 (include)
CMakeLists.txt:181 (include)

CMake Error at cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake:168 (message):
Unknown CUDA Architecture Name “6.1” in CUDA_SELECT_NVCC_ARCH_FLAGS
Call Stack (most recent call first):
cmake/public/utils.cmake:164 (cuda_select_nvcc_arch_flags)
cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag)
cmake/Dependencies.cmake:433 (include)
CMakeLists.txt:181 (include)

CMake Error at cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake:172 (message):
arch_bin wasn’t set for some reason
Call Stack (most recent call first):
cmake/public/utils.cmake:164 (cuda_select_nvcc_arch_flags)
cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag)
cmake/Dependencies.cmake:433 (include)
CMakeLists.txt:181 (include)

– Added CUDA NVCC flags for:
CMake Warning at cmake/Dependencies.cmake:543 (message):
NCCL is currently only supported under Linux.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

– Could NOT find CUB (missing: CUB_INCLUDE_DIR)
CMake Warning at cmake/Dependencies.cmake:563 (message):
Gloo can only be used on Linux.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:623 (message):
mobile opengl is only used in android or ios builds.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:699 (message):
Metal is only used in ios builds.
Call Stack (most recent call first):
CMakeLists.txt:181 (include)

– NCCL operators skipped due to no CUDA support
– Excluding ideep operators as we are not using ideep
– Excluding image processing operators due to no opencv
– Excluding video processing operators due to no opencv
– Excluding mkl operators as we are not using mkl
– MPI operators skipped due to no MPI support
– Include Observer library
– Using Lib\site-packages as python relative installation path
– Automatically generating missing init.py files.
CMake Warning at CMakeLists.txt:341 (message):
Generated cmake files are only fully tested if one builds with system glog,
gflags, and protobuf. Other settings may generate files that are not well
tested.

CMake Warning at CMakeLists.txt:390 (message):
Generated cmake files are only available when building shared libs.


– ******** Summary ********
– General:
– CMake version : 3.12.0-rc2
– CMake command : C:/Program Files/CMake/bin/cmake.exe
– Git version : v0.1.11-9211-gf87499a8f-dirty
– System : Windows
– C++ compiler : D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
– C++ compiler version : 19.0.24215.1
– BLAS : Eigen
– CXX flags : /DWIN32 /D_WINDOWS /W3 /GR /EHsc -DONNX_NAMESPACE=onnx_c2 /MP /bigobj
– Build type : Release
– Compile definitions :
– CMAKE_PREFIX_PATH :
– CMAKE_INSTALL_PREFIX : C:/Program Files/Caffe2

– BUILD_CAFFE2 : ON
– BUILD_ATEN : OFF
– BUILD_BINARY : ON
– BUILD_CUSTOM_PROTOBUF : ON
– Protobuf compiler :
– Protobuf includes :
– Protobuf libraries :
– BUILD_DOCS : OFF
– BUILD_PYTHON : ON
– Python version : 2.7.14
– Python includes : C:/Python27/include
– BUILD_SHARED_LIBS : OFF
– BUILD_TEST : OFF
– USE_ASAN : OFF
– USE_ATEN : OFF
– USE_CUDA : ON
– CUDA static link : OFF
– USE_CUDNN : ON
– CUDA version : 8.0
– cuDNN version : 7.0.5
– CUDA root directory : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
– CUDA library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cuda.lib
– cudart library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudart_static.lib
– cublas library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas.lib;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas_device.lib
– cufft library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cufft.lib
– curand library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/curand.lib
– cuDNN library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib
– nvrtc : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/nvrtc.lib
– CUDA include path : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include
– NVCC executable : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/bin/nvcc.exe
– CUDA host compiler : $(VCInstallDir)bin
– USE_TENSORRT : OFF
– USE_ROCM : OFF
– USE_EIGEN_FOR_BLAS : ON
– USE_FFMPEG : OFF
– USE_GFLAGS : OFF
– USE_GLOG : OFF
– USE_GLOO : OFF
– USE_LEVELDB : OFF
– USE_LITE_PROTO : OFF
– USE_LMDB : OFF
– USE_METAL : OFF
– USE_MKL :
– USE_MOBILE_OPENGL : OFF
– USE_MPI : OFF
– USE_NCCL : OFF
– USE_NERVANA_GPU : OFF
– USE_NNPACK : OFF
– USE_OBSERVERS : ON
– USE_OPENCL : OFF
– USE_OPENCV : OFF
– USE_OPENMP : OFF
– USE_PROF : OFF
– USE_REDIS : OFF
– USE_ROCKSDB : OFF
– USE_ZMQ : OFF
– Public Dependencies : Threads::Threads
– Private Dependencies : cpuinfo;onnxifi_loader
– Configuring incomplete, errors occurred!
See also “D:/Yeverino/git_projects/pytorch/build/CMakeFiles/CMakeOutput.log”.
See also “D:/Yeverino/git_projects/pytorch/build/CMakeFiles/CMakeError.log”.
“Caffe2 building failed”

hi all

How to match (TORCH_CUDA_ARCH_LIST to “6.1”)