WOULD SOMEONE FROM NVIDIA SAY OUT LOUD XAVIER CANNOT BE USED FOR TRAINING ?

If yes, then how?

The following are my timings for a dataset of just 350 rows and a 2 columns, basically a TimeSeries:

mdl.summary()

Model: “sequential_1”


Layer (type) Output Shape Param #

lstm_1 (LSTM) (None, 150) 91200


dense_1 (Dense) (None, 1) 151

Total params: 91,351
Trainable params: 91,351
Non-trainable params: 0


mdl.fit_generator( gnTrain, epochs = 10)

Epoch 1/10
301/301 [==============================] - 20s 65ms/step - loss: 0.0026 0s - loss:
Epoch 2/10
301/301 [==============================] - 19s 64ms/step - loss: 0.0021
Epoch 3/10
301/301 [==============================] - 20s 67ms/step - loss: 0.0026
Epoch 4/10
301/301 [==============================] - 20s 67ms/step - loss: 0.0017
Epoch 5/10
301/301 [==============================] - 20s 67ms/step - loss: 0.0017
Epoch 6/10
301/301 [==============================] - 20s 66ms/step - loss: 0.0021
Epoch 7/10
301/301 [==============================] - 20s 66ms/step - loss: 0.0017
Epoch 8/10
301/301 [==============================] - 19s 62ms/step - loss: 0.0016
Epoch 9/10
301/301 [==============================] - 19s 63ms/step - loss: 0.0018
Epoch 10/10
301/301 [==============================] - 18s 61ms/step - loss: 0.0018

<keras.callbacks.callbacks.History at 0x7f18176f98>

My System Settings:

import tensorflow
tensorflow.version
‘1.14.0’

import keras
keras.version
Using TensorFlow backend.
‘2.3.1’

tf.test.is_gpu_available(
cuda_only=True,
min_cuda_compute_capability=None
)
True

tf.test.is_built_with_cuda()
True

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: “/device:CPU:0”
device_type: “CPU”
memory_limit: 268435456
locality {
}
incarnation: 12523116503951288556
, name: “/device:XLA_CPU:0”
device_type: “XLA_CPU”
memory_limit: 17179869184
locality {
}
incarnation: 3016287810009935449
physical_device_desc: “device: XLA_CPU device”
, name: “/device:XLA_GPU:0”
device_type: “XLA_GPU”
memory_limit: 17179869184
locality {
}
incarnation: 17566754380959506424
physical_device_desc: “device: XLA_GPU device”
, name: “/device:GPU:0”
device_type: “GPU”
memory_limit: 10438272410
locality {
bus_id: 1
links {
}
}
incarnation: 14651857846486284700
physical_device_desc: “device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2”
]

Do i have the proper versions? Am i missing something?

sudha@sudhajx:~ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Mon_Mar_11_22:13:24_CDT_2019 Cuda compilation tools, release 10.0, V10.0.326 sudha@sudhajx:~ nvidia-smi
bash: nvidia-smi: command not found

As you might assume, am new to ML/DL. And bought Xavier as I thought it would be a one stop investment than going for a full fledged server for ML/DL developement, training and future deployment purposes.

To be honest am really not impressed by the timings, as the dataset I used is a meagre 350 odd rows. I STILL HOPE THE PROGRAM IS NOT USING THE FULL POTENTIAL OF THE GPU’S. If I am wrong, then my investment was a mistake.

WOULD SOMEONE FROM NVIDIA SAY OUT LOUD XAVIER CANNOT BE USED FOR TRAINING

I am not from NVIDIA, but Jetsons were always intended for edge device use, not training. The expectation is to train elsewhere, and then deploy the model on the edge device.

Hi,

We don’t recommended Jetson for training.

Usually, a training process requires large database IO but Jetson storage and bandwidth is limited.
You should have a better experience for training on our desktop GPU.

Thanks.

Hi AastaLLL,

I dont mean large datasets like TeraBytes or more.

My use case might require training datasets of a couple of GB’s at Max.

That too, If IO is really a constraint I can bring it down under a GB, I guess.

I just want to know if training would be possible in that case?

You suggested PyTorch earlier, which I could not install. Pls help

Hi,

If the training time is not an issue for you, you can deploy a training job on the Xaiver as well.

For pyTorch, here are some prebuilt and compile instructions for your reference:
https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/

Thanks.

Hi, I did not see your reply for quiet a while, as I did not expect a positive reply. Thank you.

Hi, Installed PyTorch successfully through your link. But, having some trouble installing torchVision though.

I get this error:

sudha@sudhajx:~/torchvision$ sudo python setup.py install
Traceback (most recent call last):
File “setup.py”, line 6, in
from setuptools import setup, find_packages
ImportError: No module named setuptools

But, trying to install setuptools, I get:

sudha@sudhajx:~$ sudo pip3 install -U setuptools
Requirement already up-to-date: setuptools in /usr/local/lib/python3.6/dist-packages (41.6.0)

I have tried this also:

git clone https://github.com/pytorch/vision

which runs successfully…
still when I try to install, I get:

sudha@sudhajx:~/vision$ sudo python setup.py install
Traceback (most recent call last):
File “setup.py”, line 6, in
from setuptools import setup, find_packages
ImportError: No module named setuptools

How do I proceed? TIA

Try python3 instead of python.

That should get you past the immediate error.

In Ubuntu, python is paired with pip, and python3 is paired with pip3. I make that mistake all the time, it’s confusing. At some point Ubuntu will come with only python 3 and at that point it’ll just be “python” and “pip” but for the moment some things still need python 2 around so both are installed side by side leading to these sorts of confusions. Also, you shouldn’t need sudo here and generally it’s a bad idea to “sudo pip” anything. By default, pip now does a --user install on Ubuntu. Alternatively you can look into virtual environments like virtualenv or pipenv.

Thanks mdegans!

It kind of worked.

In the sense it took a long while, and lot of warnings, but no errors…

Eventually I get to import with:

import torchvision

/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+b8ef532-py3.6-linux-aarch64.egg/torchvision/io/_video_opt.py:17: UserWarning: video reader based on ffmpeg c++ ops not available

And when I try the version it gives me:

print( torchvision.version)

0.5.0a0+b8ef532

Am confused if it is a proper version number at all? and if it has installed properly?

Seems to be correct. I googled it and wound up with a thread here:

https://devtalk.nvidia.com/default/topic/1049071/pytorch-for-jetson-nano-version-1-3-0-now-available/?offset=152

The warning can safely be ignored according to that thread.

Thank you for your time and effort, mdegans:-)

You’re welcome. Just be aware that it will be slower training on an embedded platform as opposed to desktop or server stuffed with GPUs.