Pytorch not recognizing CUDA on AGX

iandanielsooknanan · May 17, 2021, 11:09pm

Hello everyone, I have set up a new AGX with the latest Jetpack and I have installed Pytorch 1.8.1 following these instructions : link

I confirmed that CUDA 10.2 is installed using nvcc -V however when I check in python if cuda is available, by importing Pytorch then using “torch.cuda.is_available()”, it says False. Why does Pytorch not recognize CUDA and more importantly, how do I fix this? Thank you.

dusty_nv · May 18, 2021, 12:07am

Hi @iandanielsooknanan, if you run the CUDA deviceQuery sample, does it report the GPU? Does PyTorch detect the GPU after running deviceQuery?

If not, are you sure that PyTorch was built with CUDA enabled? Soon after you start the PyTorch build, it will print out a summary of the configuration that it is building with. I recommend that you check that and save the log of your build like so: python3 setup.py build | tee build_log.txt

You don’t have to wait for the full build to complete before checking the build configuration in the log, only the first couple minutes and then it should print out. For example, my build config from PyTorch 1.8 looks like this:

-- Found CUDA: /usr/local/cuda (found suitable version "10.2", minimum required is "7.0") 
-- CUDA detected: 10.2
-- Found CUDA: /usr/local/cuda (found version "10.2") 
-- 
-- ******** Summary ********
--   CMake version         : 3.10.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 7.5.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wnon-virtual-dtor
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
--   CMAKE_PREFIX_PATH     : /usr/lib/python3/dist-packages;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/torch
--   CMAKE_MODULE_PATH     : /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/cmake/Modules;/media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.8.0
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : OFF
--   ONNXIFI_ENABLE_EXT    : OFF
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- 
-- ******** Summary ********
--   CMake version         : 3.10.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler version  : 7.5.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wnon-virtual-dtor
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1
--   CMAKE_PREFIX_PATH     : /usr/lib/python3/dist-packages;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/torch
--   CMAKE_MODULE_PATH     : /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/cmake/Modules;/media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/cmake/public/../Modules_CUDA_fix
-- 
--   ONNX version          : 1.4.1
--   ONNX NAMESPACE        : onnx_torch
--   ONNX_BUILD_TESTS      : OFF
--   ONNX_BUILD_BENCHMARKS : OFF
--   ONNX_USE_LITE_PROTO   : OFF
--   ONNXIFI_DUMMY_BACKEND : OFF
-- 
--   Protobuf compiler     : 
--   Protobuf includes     : 
--   Protobuf libraries    : 
--   BUILD_ONNX_PYTHON     : OFF
-- Found CUDA with FP16 support, compiling with torch.cuda.HalfTensor
-- Adding -DNDEBUG to compile flags
-- MAGMA not found. Compiling without MAGMA support
-- Could not find hardware support for NEON on this machine.
-- No OMAP3 processor on this machine.
-- No OMAP4 processor on this machine.
-- asimd/Neon found with compiler flag : -D__NEON__
-- Found a library with LAPACK API (open).
-- MIOpen not found. Compiling without MIOpen support
-- Version: 7.0.3
-- Build type: Release
-- CXX_STANDARD: 14
-- Required features: cxx_variadic_templates
-- Configuring Kineto dependency:
--   KINETO_SOURCE_DIR = /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/third_party/kineto/libkineto
--   KINETO_BUILD_TESTS = OFF
--   KINETO_LIBRARY_TYPE = static
--   CUDA_SOURCE_DIR = /usr/local/cuda
-- Could not find CUPTI library, skipping Kineto build
-- GCC 7.5.0: Adding gcc and gcc_s libs to link line
-- NUMA paths:
-- /usr/include
-- /usr/lib/aarch64-linux-gnu/libnuma.so
-- Using ATen parallel backend: OMP
-- Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY OPENSSL_INCLUDE_DIR) 
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
-- Found OpenMP: TRUE   
-- Configuring build for SLEEF-v3.6.0
-- Using option `-Wall -Wno-unused -Wno-attributes -Wno-unused-result -Wno-psabi -ffp-contract=off -fno-math-errno -fno-trapping-math` to compile libsleef
-- Building shared libs : OFF
-- Building static test bins: OFF
-- MPFR : LIB_MPFR-NOTFOUND
-- GMP : LIBGMP-NOTFOUND
-- RT : /usr/lib/aarch64-linux-gnu/librt.so
-- FFTW3 : LIBFFTW3-NOTFOUND
-- OPENSSL : 
-- SDE : SDE_COMMAND-NOTFOUND
-- RUNNING_ON_TRAVIS : 
-- COMPILER_SUPPORTS_OPENMP : 1
-- NCCL operators skipped due to no CUDA support
-- Excluding FakeLowP operators
-- Excluding ideep operators as we are not using ideep
-- Excluding image processing operators due to no opencv
-- Excluding video processing operators due to no opencv
-- Include Observer library
-- /usr/bin/c++ /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/torch/abi-check.cpp -o /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- MPI_INCLUDE_PATH: /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi;/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent;/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include;/usr/lib/aarch64-linux-gnu/openmpi/include
-- MPI_LIBRARIES: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
-- MPIEXEC: /usr/bin/mpiexec
-- pytorch is compiling with OpenMP. 
OpenMP CXX_FLAGS: -fopenmp. 
OpenMP libraries: /usr/lib/gcc/aarch64-linux-gnu/7/libgomp.so;/usr/lib/aarch64-linux-gnu/libpthread.so.
-- Caffe2 is compiling with OpenMP. 
OpenMP CXX_FLAGS: -fopenmp. 
OpenMP libraries: /usr/lib/gcc/aarch64-linux-gnu/7/libgomp.so;/usr/lib/aarch64-linux-gnu/libpthread.so.
-- Using lib/python3/dist-packages as python relative installation path
-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.10.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.5.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /usr/lib/python3/dist-packages;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /media/nvidia/WD_NVME/PyTorch/JetPack_4.4.1/pytorch-v1.8.0/torch
-- 
--   TORCH_VERSION         : 1.8.0
--   CAFFE2_VERSION        : 1.8.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Link local protobuf : ON
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.6.9
--     Python executable   : /usr/bin/python3
--     Pythonlibs version  : 3.6.9
--     Python library      : /usr/lib/libpython3.6m.so.1.0
--     Python includes     : /usr/include/python3.6m
--     Python site-packages: lib/python3/dist-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : open
--   USE_LAPACK            : 1
--     LAPACK              : open
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : OFF
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 10.2
--     cuDNN version       : 8.0.0
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/local/cuda/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/lib/aarch64-linux-gnu/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cuDNN library       : /usr/lib/aarch64-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     NVCC flags          : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : OFF
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : 0
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : 0
--   USE_PYTORCH_QNNPACK   : 0
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   USE_DEPLOY           : OFF
--   Public Dependencies  : Threads::Threads
--   Private Dependencies : pthreadpool;cpuinfo;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
-- Configuring done