Keras on Jetson TK1

Hi,

I tried both tensorflow and Keras today. Both are good.
Actually, it’s hard for us to make sure all the 3party libraries run correctly on our platform.
It’s recommended to ask libraries developer for details since they are more familiar with their source.

Tensorflow: topic_1011135_tensorflow.zip(same as #11)

nvidia@tegra-ubuntu:~$ python topic_1011135_tensorflow.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Download Done!
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM has no NUMA node, hardcoding to return zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 2.92GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
WARNING:tensorflow:From topic_1011135_tensorflow.py:67: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): GP10B, Compute Capability 6.2
step 0, training accuracy 0.02
step 100, training accuracy 0.82
step 200, training accuracy 0.98
step 300, training accuracy 0.84
step 400, training accuracy 0.98
step 500, training accuracy 0.9
step 600, training accuracy 0.98
step 700, training accuracy 0.92
step 800, training accuracy 0.88
step 900, training accuracy 0.98
step 1000, training accuracy 0.98
step 1100, training accuracy 1
step 1200, training accuracy 0.94
step 1300, training accuracy 0.98
step 1400, training accuracy 0.98
step 1500, training accuracy 0.92
step 1600, training accuracy 0.98
step 1700, training accuracy 0.94
step 1800, training accuracy 1
step 1900, training accuracy 0.96
step 2000, training accuracy 0.98
step 2100, training accuracy 0.98
step 2200, training accuracy 0.94
step 2300, training accuracy 1
step 2400, training accuracy 0.96
step 2500, training accuracy 0.94

Keras: topic_1011135_keras.zip

nvidia@tegra-ubuntu:~$ python topic_1011135_keras.py 
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM has no NUMA node, hardcoding to return zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.21GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.92G (4210061312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187]   StreamExecutor device (0): GP10B, Compute Capability 6.2
60000/60000 [==============================] - 90s - loss: 0.3444 - acc: 0.8952 - val_loss: 0.0775 - val_acc: 0.9765
Epoch 2/12
60000/60000 [==============================] - 43s - loss: 0.1169 - acc: 0.9660 - val_loss: 0.0528 - val_acc: 0.9830
Epoch 3/12
60000/60000 [==============================] - 39s - loss: 0.0885 - acc: 0.9739 - val_loss: 0.0453 - val_acc: 0.9860
Epoch 4/12
60000/60000 [==============================] - 41s - loss: 0.0742 - acc: 0.9779 - val_loss: 0.0401 - val_acc: 0.9864
Epoch 5/12
60000/60000 [==============================] - 38s - loss: 0.0646 - acc: 0.9806 - val_loss: 0.0368 - val_acc: 0.9877
Epoch 6/12
60000/60000 [==============================] - 37s - loss: 0.0576 - acc: 0.9825 - val_loss: 0.0321 - val_acc: 0.9891
Epoch 7/12
60000/60000 [==============================] - 39s - loss: 0.0526 - acc: 0.9845 - val_loss: 0.0340 - val_acc: 0.9881
Epoch 8/12
60000/60000 [==============================] - 38s - loss: 0.0498 - acc: 0.9851 - val_loss: 0.0330 - val_acc: 0.9895
Epoch 9/12
60000/60000 [==============================] - 37s - loss: 0.0469 - acc: 0.9857 - val_loss: 0.0309 - val_acc: 0.9900
Epoch 10/12
60000/60000 [==============================] - 37s - loss: 0.0427 - acc: 0.9875 - val_loss: 0.0313 - val_acc: 0.9896
Epoch 11/12
60000/60000 [==============================] - 37s - loss: 0.0404 - acc: 0.9875 - val_loss: 0.0293 - val_acc: 0.9899
Epoch 12/12
60000/60000 [==============================] - 37s - loss: 0.0402 - acc: 0.9882 - val_loss: 0.0282 - val_acc: 0.9897
Test loss: 0.0281542236623
Test accuracy: 0.9897

topic_1011135_keras.zip (1.06 KB)
topic_1011135_tensorflow.zip (1.14 KB)

Can you also include the wheel for TF and the exact install instructions you used for both TF and Keras? Thank you.

Hi,

We follow this page to install tensorflow.
For Keras, same as the procedures mentioned in #8.

Could you please publish your TF wheel for public use on TK1?

Hi,

It’s not suitable for us to publish a 3-party library.
Maybe you can post here:
https://github.com/tensorflow/tensorflow/issues/851

May I know your current status?
Thanks.

First, I had to find another x86+nV system to have a smaller number of training steps to make it work w/o swapfile space. Now I am going to follow the instructions for the TX2 page you sent to see if I can get a correct whl to build.

I am unable to build bazel because there is not enough storage on the TK1.
ubuntu@tegra-ubuntu:~/mybazel$ ./compile.sh
INFO: You can skip this first step by providing a path to the bazel binary as second argument:
INFO: ./compile.sh compile /path/to/bazel

ubuntu@tegra-ubuntu:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/root 14318640 5501360 8066896 41% /

How much space is needed to build bazel?

Hi,

Usually, I built tensorflow on a 128G SD card.
I think 16G external space will be the minimum since you also need to add some swap space when executing.