Hi there,
I’ve used nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3 as a base and updated to tensorflow-2.5.0+nv21.6-cp36-cp36m-linux_aarch64 inside the container. I then downloaded this model http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_resnet50_v1_800x1333_coco17_gpu-8.tar.gz and used the following code to load it:
import tensorflow as tf
import time
load_start = time.time()
with tf.device("GPU:0"):
# using CPU instead of GPU doesnt change anything
tf.saved_model.load('/sig/models/523368bcf91411eba96d0242ac110002/saved_model')
print(f"Loading took {time.time() - load_start}s")
This is the output:
root@8b121c2c4b52:~# python3 test_tf.py
2021-08-10 10:23:41.298229: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-10 10:23:47.536714: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-10 10:23:47.574417: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.574641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.18GiB deviceMemoryBandwidth: 82.08GiB/s
2021-08-10 10:23:47.574747: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-10 10:23:47.638952: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-08-10 10:23:47.639613: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2021-08-10 10:23:47.669661: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-10 10:23:47.705917: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-10 10:23:47.754288: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-08-10 10:23:47.781275: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2021-08-10 10:23:47.783776: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-10 10:23:47.784057: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.784366: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.784501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-10 10:23:47.792187: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.792479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.377GHz coreCount: 8 deviceMemorySize: 31.18GiB deviceMemoryBandwidth: 82.08GiB/s
2021-08-10 10:23:47.792711: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.792890: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:47.793019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-10 10:23:47.793267: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-10 10:23:50.274390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-10 10:23:50.274489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-10 10:23:50.274526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-10 10:23:50.274911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:50.275160: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:50.275354: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-10 10:23:50.275510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 27510 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
Loading took 71.70220994949341s
so, as you can see the loading takes ~72 seconds, which feels a bit too long. On my Ubuntu machine with a 5 year old i7 and a GTX 1060 the loading takes 6 seconds, looking at the hardware alone there shouldn’t be a >10x increase in loading time, or would that be plausible?
The Ubuntu machine uses CUDA 11.x and the NVIDIA TF 2.5.0 build links to CUDA 10.x, could that be the reason?
Why is the most recent NVIDIA TF build still linked to CUDA 10.x instead of 11.x?
Thanks in advance