this is for ssd-mobilenet:
2021-08-30 11:58:57.116817: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-30 11:59:06.516826: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-30 11:59:06.557076: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:06.557372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7,58GiB deviceMemoryBandwidth: 66,10GiB/s
2021-08-30 11:59:06.557472: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-30 11:59:06.606590: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-08-30 11:59:06.607687: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.10
2021-08-30 11:59:06.625367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-30 11:59:06.667181: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-30 11:59:06.685707: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-08-30 11:59:06.703972: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.10
2021-08-30 11:59:06.705111: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-30 11:59:06.705643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:06.706125: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:06.706575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-30 11:59:06.727893: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-08-30 11:59:06.727987: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-08-30 11:59:06.728135: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1611] Profiler found 1 GPUs
2021-08-30 11:59:06.744209: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcupti.so.10.2
2021-08-30 11:59:36.632407: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:36.633732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1734] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7,58GiB deviceMemoryBandwidth: 66,10GiB/s
2021-08-30 11:59:36.634002: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:36.634331: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:36.634423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1872] Adding visible gpu devices: 0
2021-08-30 11:59:36.634580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.10.2
2021-08-30 11:59:40.627209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-30 11:59:40.627349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-30 11:59:40.627407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-30 11:59:40.627819: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:40.628155: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:40.628443: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1001] ARM64 does not support NUMA - returning NUMA node zero
2021-08-30 11:59:40.628625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1628 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2021-08-30 12:01:57.497325: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-30 12:01:58.396764: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 31000000 Hz
2021-08-30 12:02:05.768983: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-30 12:02:07.882579: I tensorflow/stream_executor/cuda/cuda_dnn.cc:380] Loaded cuDNN version 8000
2021-08-30 12:02:14.281739: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.10
2021-08-30 12:02:25.864267: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,73GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:25.897688: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,73GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:26.069769: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,79GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:26.069978: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,79GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:27.920629: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,76GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:27.920851: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,76GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:27.967980: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,77GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:27.968215: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,77GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:28.434495: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2,51GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:28.434783: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2,51GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2021-08-30 12:02:28.565390: W tensorflow/core/common_runtime/bfc_allocator.cc:337] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
{'detection_anchor_indices': <tf.Tensor: shape=(1, 100), dtype=float32, numpy=
array([[ 9419., 10193., 14057., 10079., 9359., 9275., 10139., 15377.,
14663., 10799., 13913., 10853., 9389., 10307., 13307., 12299.,
26728., 26410., 15377., 13451., 25834., 14771., 10109., 14171.,
9329., 9305., 25948., 26093., 9335., 9335., 10025., 10253.,
26069., 14509., 13973., 8699., 15461., 9305., 10025., 9305.,
11723., 9995., 27946., 11405., 26452., 11555., 25648., 25648.,
13997., 10169., 26422., 26710., 30466., 26194., 9305., 11555.,
14747., 26296., 15151., 10079., 26626., 26266., 30520., 9995.,
26482., 25864., 4738., 13391., 12527., 25678., 24019., 5968.,
14573., 26656., 11723., 11429., 12557., 10979., 26338., 10025.,
26296., 9935., 14915., 25966., 30570., 30498., 26062., 8729.,
15569., 10193., 9305., 11495., 26998., 30520., 15289., 13043.,
29035., 11951., 9329., 11981.]], dtype=float32)>, 'detection_boxes': <tf.Tensor: shape=(1, 100, 4), dtype=float32, numpy=
array([[[3.29538405e-01, 3.26002032e-01, 5.21920264e-01, 3.70194346e-01],
[3.37561607e-01, 6.68904066e-01, 5.51527321e-01, 7.14050770e-01],
[4.81567234e-01, 3.76965284e-01, 7.15000927e-01, 4.22366321e-01],
[3.34119827e-01, 4.75237340e-01, 5.49582362e-01, 5.19531071e-01],
.... (and then are the results.)