Problem installing Torch7 with CUDA 10

I’m trying to rebuild Torch7 with the latest cuda toolkit for Xavier. The Torch install fails when trying to link libcublas_device.a. In previous toolkits this file usually gets installed to /usr/local/cuda/lib64, however that static library is not located there. Ideas?

Hi,

Could you check if Torch supports CUDA 10.0 first?
Thanks.

Torch7 is in maintenance mode, with support for Cuda 9.1. Since Cuda 9.2 dropped support for the libcublas_device.a module, which Torch7 is linking, it is incompatible (does not build). It looks like the Torch/CUDNN-7 branch also has problems compiling against cudnn-7.3. Hence Xavier, is incompatible with Torch at the moment.

Hi,

Recently, we have successfully built PyTorch from source.
Maybe you can have some information from it:
https://devtalk.nvidia.com/default/topic/1041716/jetson-agx-xavier/pytorch-install-problem/post/5284747/#5284747

We will also check the Torch7 for Xavier and share information with you later.
Thanks.

My objective is to get qlua to function and execute the file webcam_demo.lua from https://github.com/jcjohnson/fast-neural-style
Updating CMake to try again.
Found some reference: https://github.com/torch/cutorch/issues/834
starting again

export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" ##does it enable optimization or disables it?
TORCH_LUA_VERSION=LUA53 ./install.sh
Do you want to automatically prepend the Torch install location
to PATH and LD_LIBRARY_PATH in your /home/nvidia/.bashrc? (yes/no)
[yes] >>>

so far so good, no errors.

[ 25%] Building C object opencv/CMakeFiles/camopencv.dir/opencv.c.o
/tmp/luarocks_camera-1.1-0-6487/lua_camera/opencv/opencv.c:136:30: error: array type has incomplete element type ‘struct luaL_reg’
 static const struct luaL_reg opencv [] = {
luarocks install camera
Installing https://raw.githubusercontent.com/torch/rocks/master/camera-1.1-0.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/camera-1.1-0.rockspec... switching to 'build' mode
Cloning into 'lua---camera'...
remote: Enumerating objects: 29, done.
remote: Counting objects: 100% (29/29), done.
remote: Compressing objects: 100% (28/28), done.
remote: Total 29 (delta 3), reused 10 (delta 0), pack-reused 0
Receiving objects: 100% (29/29), 18.57 KiB | 105.00 KiB/s, done.
Resolving deltas: 100% (3/3), done.
cmake -E make_directory build;
cd build;
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/home/nvidia/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/home/nvidia/torch/install/lib/luarocks/rocks/camera/1.1-0"; 
make
   
-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Torch7 in /home/nvidia/torch/install
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_kill
-- Looking for pthread_kill - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-10.0 (found suitable exact version "10.0") 
-- Found OpenCV: /usr/local (found version "3.4.0") 
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- OpenMP Found with compiler flag : -fopenmp
CMake Warning at video4linux/CMakeLists.txt:13 (FIND_PACKAGE):
  By not providing "FindARM.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "ARM", but
  CMake did not find one.

  Could not find a package configuration file provided by "ARM" with any of
  the following names:

    ARMConfig.cmake
    arm-config.cmake

  Add the installation prefix of "ARM" to CMAKE_PREFIX_PATH or set "ARM_DIR"
  to a directory containing one of the above files.  If "ARM" provides a
  separate development package or SDK, be sure it has been installed.

-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/luarocks_camera-1.1-0-9121/lua---camera/build
Scanning dependencies of target camopencv
[ 25%] Building C object opencv/CMakeFiles/camopencv.dir/opencv.c.o
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:33:8: error: unknown type name ‘CvCapture’
 static CvCapture* capture[MAXIDX];
        ^~~~~~~~~
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c: In function ‘l_initCam’:
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:51:21: warning: implicit declaration of function ‘cvCaptureFromCAM’ [-Wimplicit-function-declaration]
     capture[fidx] = cvCaptureFromCAM(idx);
                     ^~~~~~~~~~~~~~~~
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:51:19: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     capture[fidx] = cvCaptureFromCAM(idx);
                   ^
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:58:19: warning: implicit declaration of function ‘cvQueryFrame’ [-Wimplicit-function-declaration]
     frame[fidx] = cvQueryFrame ( capture[fidx] );
                   ^~~~~~~~~~~~
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:58:17: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     frame[fidx] = cvQueryFrame ( capture[fidx] );
                 ^
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:64:19: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
       frame[fidx] = cvQueryFrame ( capture[fidx] );
                   ^
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:79:21: warning: implicit declaration of function ‘cvCreateFileCapture’; did you mean ‘cvCreateKalman’? [-Wimplicit-function-declaration]
     capture[fidx] = cvCreateFileCapture(file);
                     ^~~~~~~~~~~~~~~~~~~
                     cvCreateKalman
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:79:19: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
     capture[fidx] = cvCreateFileCapture(file);
                   ^
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c: In function ‘l_grabFrame’:
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:98:14: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
   frame[idx] = cvQueryFrame ( capture[idx] );
              ^
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c: In function ‘l_releaseCam’:
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:130:3: warning: implicit declaration of function ‘cvReleaseCapture’; did you mean ‘cvReleaseData’? [-Wimplicit-function-declaration]
   cvReleaseCapture( &capture[idx] );
   ^~~~~~~~~~~~~~~~
   cvReleaseData
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c: At top level:
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:135:30: error: array type has incomplete element type ‘struct luaL_reg’
 static const struct luaL_reg opencv [] = {
                              ^~~~~~
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c: In function ‘luaopen_libcamopencv’:
/tmp/luarocks_camera-1.1-0-9121/lua---camera/opencv/opencv.c:143:3: warning: implicit declaration of function ‘luaL_openlib’; did you mean ‘luaL_newlib’? [-Wimplicit-function-declaration]
   luaL_openlib(L, "libcamopencv", opencv, 0);
   ^~~~~~~~~~~~
   luaL_newlib
opencv/CMakeFiles/camopencv.dir/build.make:62: recipe for target 'opencv/CMakeFiles/camopencv.dir/opencv.c.o' failed
make[2]: *** [opencv/CMakeFiles/camopencv.dir/opencv.c.o] Error 1
CMakeFiles/Makefile2:90: recipe for target 'opencv/CMakeFiles/camopencv.dir/all' failed
make[1]: *** [opencv/CMakeFiles/camopencv.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

Error: Build error: Failed building.

Hi,

The issue is from the OpenCV library.
It’s recommended to check which OpenCV version is used by the author first.

Thanks.

Hi AastaLLL, thank you for your response.

It is rather wrapper that causes the fault than opencv itself or opencv version mismatch, as it seems to me.
luarocks is a package manager and it installs the wrapper package and fails.

The error is rather luarocks infrastructure related than connected to the project code, in my opinion. But I will ask authors what opencv version did they use.
It is either just wrapper can not install, or wrapper can not install because it approaches existing opencv installation and fails at this point. But I would rather say that it checks with the package manager and fails at the point of interaction with it: that seems explicit, may be on implicit level it as well tries to communicate with installed library, but I do not think that should be the cause, neither I can see how to exclude that possibiliy.

at workstation within torch ngc container I was able to install luarocks camera package with building luarocks camera package from custom repository: git clone https://github.com/freedomsb/lua_camera.git through the advice at https://github.com/torch/demos/issues/65 from that I just built it from source and it got installed to the system in a way qlua could use it to call opencv.
Thanks.
https://www.youtube.com/watch?v=U8DTYX_utx0

Hi, Andrey1984

Do you fix this issue after checking with the authors?
Thanks.

Hi AastaLLL,
I can confirm Torch7 to work with Xavier.
I can install it at TX2 as well.
I can install it at nano as well.

Steps and workarounds include but not limited to listed below:

  1. apply patch and use newest cmake
sudo apt-get purge cmake
git clone https://github.com/Kitware/CMake.git
cd CMake
./bootstrap; make; sudo make install
cd ~/torch
 rm -fr cmake/3.6/Modules/FindCUDA*

3.Apply the following patch to cutorch

diff --git a/lib/THC/THCAtomics.cuh b/lib/THC/THCAtomics.cuh
index 400875c..ccb7a1c 100644
--- a/lib/THC/THCAtomics.cuh
+++ b/lib/THC/THCAtomics.cuh
@@ -94,6 +94,7 @@ static inline __device__ void atomicAdd(long *address, long val) {
 }
 
 #ifdef CUDA_HALF_TENSOR
+#if !(__CUDA_ARCH__ >= 700 || !defined(__CUDA_ARCH__) )
 static inline  __device__ void atomicAdd(half *address, half val) {
   unsigned int * address_as_ui =
       (unsigned int *) ((char *)address - ((size_t)address & 2));
@@ -117,6 +118,7 @@ static inline  __device__ void atomicAdd(half *address, half val) {
    } while (assumed != old);
 }
 #endif
+#endif
cd extra/cutorch
 cat > atomic.patch
<copy and paste the patch>
 patch -p1 < atomic.patch
Build
 

./clean.sh
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
./install.sh

last step. install camera

git clone https://github.com/freedomsb/lua_camera.git
$ cd lua_camera
$ luarocks install camera-1.1-0.rockspec

RESOLVED

Sources: