After switch tensorflow from 2.16.1 CPU to v60dp tensorflow==2.15.0+nv24.03 GPU version.
I got below warnings, any one know why? Is there any way to fix the warnings?
- issue 1: Unable to register cuDNN factory
2024-04-29 10:24:41.927931: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9373] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-29 10:24:41.928049: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-29 10:24:41.936001: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1534] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
- issue 2: could not open file to read NUMA node
2024-04-29 10:24:52.248622: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.369514: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.369787: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.371141: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.371276: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.371371: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.591879: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.592299: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.592376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2019] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-04-29 10:24:52.592525: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-04-29 10:24:52.592619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1926] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1815 MB memory: -> device: 0, name: Orin, pci bus id: 0000:00:00.0, compute capability: 8.7
2024-04-29 10:24:54.802741: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
- issue 3: ran out of memory
2024-04-29 10:25:06.717116: I external/local_xla/xla/service/service.cc:168] XLA service 0xaaab20b03270 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-04-29 10:25:06.717227: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): Orin, Compute Capability 8.7
2024-04-29 10:25:06.865925: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-04-29 10:25:07.832752: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:467] Loaded cuDNN version 8904
2024-04-29 10:25:16.422695: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:16.824818: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:17.273160: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:17.273278: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:18.320255: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:18.568112: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:18.966801: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:19.409092: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:19.409221: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:25:20.455916: W external/local_tsl/tsl/framework/bfc_allocator.cc:296] Allocator (GPU_0_bfc) ran out of memory trying to allocate 800.14MiB with freed_by_count=0. The caller indicates that this is not a failure, but this may mean that there could be performance gains if more memory were available.
2024-04-29 10:26:59.030551: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng1{k2=7,k3=0} for conv (f32[512,512,3,3]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,512,28,28]{3,2,1,0}, f32[32,512,28,28]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBackwardFilter", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...
2024-04-29 10:26:59.116775: E external/local_xla/xla/service/slow_operation_alarm.cc:133] The operation took 1.086371749s
Trying algorithm eng1{k2=7,k3=0} for conv (f32[512,512,3,3]{3,2,1,0}, u8[0]{0}) custom-call(f32[32,512,28,28]{3,2,1,0}, f32[32,512,28,28]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBackwardFilter", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1714357630.751689 67799 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
W0000 00:00:1714357631.213201 67799 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
38/38 ━━━━━━━━━━━━━━━━━━━━ 0s 3s/step - accuracy: 0.0186 - loss: 7.0540
W0000 00:00:1714357734.120590 67798 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
38/38 ━━━━━━━━━━━━━━━━━━━━ 243s 3s/step - accuracy: 0.0186 - loss: 6.9924 - val_accuracy: 0.0136 - val_loss: 3.7653
Epoch 2/101
38/38 ━━━━━━━━━━━━━━━━━━━━ 54s 833ms/step - accuracy: 0.0110 - loss: 3.7623 - val_accuracy: 0.0136 - val_loss: 3.7600
Epoch 3/101
38/38 ━━━━━━━━━━━━━━━━━━━━ 31s 833ms/step - accuracy: 0.0203 - loss: 3.7614 - val_accuracy: 0.0329 - val_loss: 3.7616
Epoch 4/101
Finally, the result is wrong, comapred with tensorflow 2.16.1 CPU.