Hi,
I tried both tensorflow and Keras today. Both are good.
Actually, it’s hard for us to make sure all the 3party libraries run correctly on our platform.
It’s recommended to ask libraries developer for details since they are more familiar with their source.
Tensorflow: topic_1011135_tensorflow.zip(same as #11)
nvidia@tegra-ubuntu:~$ python topic_1011135_tensorflow.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Download Done!
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM has no NUMA node, hardcoding to return zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 2.92GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
WARNING:tensorflow:From topic_1011135_tensorflow.py:67: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): GP10B, Compute Capability 6.2
step 0, training accuracy 0.02
step 100, training accuracy 0.82
step 200, training accuracy 0.98
step 300, training accuracy 0.84
step 400, training accuracy 0.98
step 500, training accuracy 0.9
step 600, training accuracy 0.98
step 700, training accuracy 0.92
step 800, training accuracy 0.88
step 900, training accuracy 0.98
step 1000, training accuracy 0.98
step 1100, training accuracy 1
step 1200, training accuracy 0.94
step 1300, training accuracy 0.98
step 1400, training accuracy 0.98
step 1500, training accuracy 0.92
step 1600, training accuracy 0.98
step 1700, training accuracy 0.94
step 1800, training accuracy 1
step 1900, training accuracy 0.96
step 2000, training accuracy 0.98
step 2100, training accuracy 0.98
step 2200, training accuracy 0.94
step 2300, training accuracy 1
step 2400, training accuracy 0.96
step 2500, training accuracy 0.94
Keras: topic_1011135_keras.zip
nvidia@tegra-ubuntu:~$ python topic_1011135_keras.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] ARM has no NUMA node, hardcoding to return zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GP10B
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.21GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GP10B, pci bus id: 0000:00:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 3.92G (4210061312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): <undefined>, <undefined>
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices:
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): GP10B, Compute Capability 6.2
60000/60000 [==============================] - 90s - loss: 0.3444 - acc: 0.8952 - val_loss: 0.0775 - val_acc: 0.9765
Epoch 2/12
60000/60000 [==============================] - 43s - loss: 0.1169 - acc: 0.9660 - val_loss: 0.0528 - val_acc: 0.9830
Epoch 3/12
60000/60000 [==============================] - 39s - loss: 0.0885 - acc: 0.9739 - val_loss: 0.0453 - val_acc: 0.9860
Epoch 4/12
60000/60000 [==============================] - 41s - loss: 0.0742 - acc: 0.9779 - val_loss: 0.0401 - val_acc: 0.9864
Epoch 5/12
60000/60000 [==============================] - 38s - loss: 0.0646 - acc: 0.9806 - val_loss: 0.0368 - val_acc: 0.9877
Epoch 6/12
60000/60000 [==============================] - 37s - loss: 0.0576 - acc: 0.9825 - val_loss: 0.0321 - val_acc: 0.9891
Epoch 7/12
60000/60000 [==============================] - 39s - loss: 0.0526 - acc: 0.9845 - val_loss: 0.0340 - val_acc: 0.9881
Epoch 8/12
60000/60000 [==============================] - 38s - loss: 0.0498 - acc: 0.9851 - val_loss: 0.0330 - val_acc: 0.9895
Epoch 9/12
60000/60000 [==============================] - 37s - loss: 0.0469 - acc: 0.9857 - val_loss: 0.0309 - val_acc: 0.9900
Epoch 10/12
60000/60000 [==============================] - 37s - loss: 0.0427 - acc: 0.9875 - val_loss: 0.0313 - val_acc: 0.9896
Epoch 11/12
60000/60000 [==============================] - 37s - loss: 0.0404 - acc: 0.9875 - val_loss: 0.0293 - val_acc: 0.9899
Epoch 12/12
60000/60000 [==============================] - 37s - loss: 0.0402 - acc: 0.9882 - val_loss: 0.0282 - val_acc: 0.9897
Test loss: 0.0281542236623
Test accuracy: 0.9897
topic_1011135_keras.zip (1.06 KB)
topic_1011135_tensorflow.zip (1.14 KB)