@tgerdes , I must be doing something wrong here, or don’t understand something…
I’ve taken a small test to a simple docker :
sudo docker run --gpus all -it --rm -v `pwd`:/workspace/Downloads nvcr.io/nvidia/tensorflow:21.08-tf1-py
Dlprof was already there -
root@22d520b6085b:/workspace# dlprof --version
NVIDIA (R) Deep Learning Profiler for Tensorflow 1.x
Copyright (c) 2019-2021 NVIDIA Corporation
v1.4.0 / r21.08 built on 2021-07-23 13:34:33 (Build 25191329)
Used this .py file:
import tensorflow as tf
sess = tf.Session()
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print (sess.run)
And ran it:
reset ; dlprof --force=true --mode=tensorflow1 python a.py
[DLProf-06:13:32] Creating Nsys Scheduler
[DLProf-06:13:32] RUNNING: TF_ENABLE_NVTX_RANGES=1 TF_FORCE_GPU_ALLOW_GROWTH=true TF_ENABLE_NVTX_RANGES_DETAILED=1 nsys profile -t cuda,nvtx -s none --show-output=true --force-overwrite=true --export=sqlite -o ./nsys_profile python a.py
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
WARNING: Backtraces will not be collected because sampling is disabled.
Collecting data...
2021-09-01 06:13:33.078979: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From a.py:2: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2021-09-01 06:13:34.023488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-01 06:13:34.066685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.066991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.067011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.070139: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.071590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.071787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.072189: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.072855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.072963: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.073049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.073387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.073653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.100390: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2021-09-01 06:13:34.101176: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f327d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-09-01 06:13:34.101203: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-09-01 06:13:34.233960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.234336: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b2c5a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-09-01 06:13:34.234355: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Quadro RTX 3000, Compute Capability 7.5
2021-09-01 06:13:34.234509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.234801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.234821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.234836: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.234846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.234855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.234864: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.234873: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.234884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.234942: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.235254: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.235521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.235544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.545287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-01 06:13:34.545318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2021-09-01 06:13:34.545324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2021-09-01 06:13:34.545501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.545882: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.546165: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-09-01 06:13:34.546191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4096 MB memory) -> physical GPU (device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From a.py:7: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2021-09-01 06:13:34.549349: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.549628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties:
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.549649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.549665: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.549674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.549684: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.549693: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.549704: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.549714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.549773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.550360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-01 06:13:34.550365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2021-09-01 06:13:34.550369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2021-09-01 06:13:34.550449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550756: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.551026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4096 MB memory) -> physical GPU (device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5
2021-09-01 06:13:34.551064: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5
<bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f1cc7b29e50>>
Processing events...
Saving temporary "/tmp/nsys-report-d4d5-4bf9-4897-eee3.qdstrm" file to disk...
Creating final output files...
Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-d4d5-4bf9-4897-eee3.qdrep"
Exporting 442 events: [===================================================100%]
Exported successfully to
/tmp/nsys-report-d4d5-4bf9-4897-eee3.sqlite
Report file moved to "/workspace/./nsys_profile.qdrep"
Report file moved to "/workspace/./nsys_profile.sqlite"
[DLProf-06:13:35] DLprof completed system call successfully
2021-09-01 06:13:35.812733: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
[DLProf-06:13:37] Initializing Nsight Systems database
[DLProf-06:13:37] Reading System Information from Nsight Systems database
[DLProf-06:13:37] Reading Domains from Nsight Systems database
[DLProf-06:13:37] Error Occurred:
[DLProf-06:13:37] Nsight Systems did not detect any NVTX traces. Please check your script and try again.