Using Theano with cuDNN on a Jetson TX1

I have installed Theano on my Jetson TX1 (armhf) using the instructions from: http://deeplearning.net/software/theano/install_ubuntu.html. I have created a .theanorc file with the following content:

[global]
floatX = float32
device = gpu

[dnn]
enabled = True

[lib]
cnmem = 0.7

Everything except cuDNN works fine. I installed cuDNN v4 for CUDA 7.0(ARMv7) while installing Caffe using: http://qinhongwei.com/2016/05/08/how-to-install-caffe-on-NVIDIA-TX1/ . The error obtained (from theano) is:

RuntimeError: You enabled cuDNN, but we aren't able to use it: Can not compile with cuDNN. We got this error:
/usr/bin/ld: cannot find -lcudnn
collect2: error: ld returned 1 exit status

How can I get cuDNN working? Thanks in advance.

The default CUDA on 64-bit is CUDA 8. If CUDA 8 is installed on the Jetson, then it seems likely a requirement for CUDA 7 would cause failure to find what it wants.

I haven’t installed CUDA 8. I checked the version and it’s 7.0. Also the system is armhf not aarch64.

Edit: The system is aarch64. My bad.

If this is a TX1, not TK1, then running 32-bit user space implies R23.2 or earlier (though the kernel is 64-bit). The limitation is actually on 64-bit versus 32-bit…armhf is not capable of running CUDA 7, so once more, if CUDA 7 is required, then this automatically forces failure. The 32-bit user space version cannot work with CUDA 7.

I am sorry. My bad. The system is actually aarch64. I have worked with three jetson tx1 systems. I got confused between them. I ran

uname -m

and it returned aarch64. So, with this new information, I believe that I must change my cuDNN from armv7 to arm64 version?

Also, didn’t Nvidia provide/extend CUDA 7 support to R23.1 and R.23.2?

CUDA 7 can only work with 64-bit so far as I know. Understand that R23.1 and R23.2 had 32-bit user space, but the kernel still ran 64-bit. At R24.1 user space was initially 32-bit, but almost immediately the 64-bit user space was also added (64-bit user space is what you will default to if you go to R24.1 and download the sample rootfs).

Mixing 32-bit user space and 64-bit kernel space was a necessary porting step, but it had a lot to be desired for many reasons. For example, Java was very confused by this. How CUDA dealt with this I’m not sure, but there were some definite performance hits for everything running in that mixed environment.

CUDA 7 was on earlier releases of L4T for a JTX1.

Based on what you said, it seems that my userspace is 32 bit armhf while kernel is 64 bit aarch64. I didn’t know that such a thing was possible.

I just tried installation of teamviewer and it returned the error:

package architecture (i386) does not match system (armhf)

Also, I have ROS Indigo installed in my system. It shouldn’t work if the userspace was aarch64. I think

uname -m

returns kernel architecture.

Also, cuDNN page has two downloads available for ARM: ARMv7 and ARM64 (https://developer.nvidia.com/rdp/cudnn-download). Currently, I am using ARMv7. I will try ARM64 to fix the theano issue.

So, I fixed cuDNN. The links between libcudnn.so, libcudnn.so.4 and libcudnn.so.4.0.7 were broken. The version used was ARMv7 not ARM64.
@linuxdev Thanks for the help. Your replies helped me learn stuff.