Hello!
I use TensorFlow and Keras to train models on the Jetson Nano.
Here are some specs useful specs:
Python version: 3.6.9
TensorFlow version: 2.2.0
JetPack version: I don’t know the version or how to check what version I have. Downloaded it in late 2019. I can reinstall the latest JetPack if necessary to resolve the bugs…
I am getting errors whenever I import and train a model on the Nano.
Errors during the import statement:
2021-07-23 11:13:21.352365: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory
2021-07-23 11:13:21.352446: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Errors during training:
2021-07-23 11:15:15.413643: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-07-23 11:15:15.487046: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2021-07-23 11:15:15.487277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: NVIDIA Tegra X1 computeCapability: 5.3
coreClock: 0.9216GHz coreCount: 1 deviceMemorySize: 3.87GiB deviceMemoryBandwidth: 194.55MiB/s
2021-07-23 11:15:15.487645: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.488019: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.488328: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.488632: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.488807: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.488962: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.489140: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-07-23 11:15:15.489177: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-07-23 11:15:15.510076: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2021-07-23 11:15:15.510954: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3f5126f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-23 11:15:15.511015: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-23 11:15:15.515619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-23 11:15:15.515691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]
Similar issues have been encountered previously in the NVIDIA community, although the solutions provided are outdated (TensorFlow 1.x).
https://forums.developer.nvidia.com/t/tensorflow-gpu-not-working-in-nano/82171
I looked on the GPU support page on TensorFlow:
https://www.tensorflow.org/install/gpu
I don’t know how to install the given dependencies or check which ones are already installed.
Would really appreciate any inputs as to how I can train on the GPU to substantially speed up the training process. Feel free to ask me for code, versions of any libraries, etc. if necessary.
Thank you!