System information
- OS Platform and Distribution: Linux Ubuntu 18.04, 5.4.0-66-generic
- Device: MPG B550I GAMING EDGE MAX WIFI (MS-7C92)
- CPU: AMD Ryzen 9 5950X 16-Core Processor
- GPU: GeForce RTX 3090, 24265MiB
- NVIDIA Driver: 460.39
Current behavior
I was experiencing very slow speed when loading tensorflow models into the GPU memory (about 1MiB/s when watching with nvidia-smi) and tried investigating this separately by using tensorflow/tensorflow:2.1.0-gpu-py3 docker image to run this command:
time python -c "import tensorflow as tf; tf.test.is_gpu_available()"
and it prints the following logs:
2021-03-25 07:39:11.387959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2021-03-25 07:39:11.388827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6 WARNING:tensorflow:From <string>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version. Instructions for updating: Use ` tf.config.list_physical_devices('GPU') ` instead.
2021-03-25 07:39:11.747442: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-03-25 07:39:11.771337: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3399945000 Hz
2021-03-25 07:39:11.772645: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x47819f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-25 07:39:11.772667: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-25 07:39:11.774848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-03-25 07:39:11.863115: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:39:11.863658: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4783c30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-03-25 07:39:11.863670: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 3090, Compute Capability 8.6
2021-03-25 07:39:11.863767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:39:11.864237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: pciBusID: 0000:2b:00.0 name: GeForce RTX 3090 computeCapability: 8.6 coreClock: 1.785GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-03-25 07:39:11.864264: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-03-25 07:39:11.864281: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-03-25 07:39:11.865203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-03-25 07:39:11.865342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-03-25 07:39:11.866162: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-03-25 07:39:11.866588: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-03-25 07:39:11.866606: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-03-25 07:39:11.866644: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:39:11.867128: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:39:11.867588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-03-25 07:39:11.867606: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
after which it doesn’t print anything and after about 2.5 minutes prints this:
2021-03-25 07:41:58.768624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-03-25 07:41:58.768648: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-03-25 07:41:58.768653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2021-03-25 07:41:58.768811: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:41:58.769347: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-25 07:41:58.769954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 22243 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:2b:00.0, compute capability: 8.6)
real 2m48.205s
user 2m47.844s
sys 0m2.368s
Running the same command again within the docker container finishes execution in 2 seconds, but using a new container results in the slow loading time again.
Expected behavior
There should not be such a long startup and model loading time. I am not entirely sure if this issue lies with tensorflow, the driver or the hardware itself. I would greatly appreciate any help I can get.