DLProf crash

Hi,
I’m running a trainer under the dlprof code, when its all done, I get this error message:

Processing events...
Saving temporary "/tmp/nsys-report-c064-35c6-78a3-a813.qdstrm" file to disk...

Creating final output files...
Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-c064-35c6-78a3-a813.qdrep"
Exporting 68461772 events: [==============================================100%]

Exported successfully to
/tmp/nsys-report-c064-35c6-78a3-a813.sqlite
Report file moved to "/opt/test/nsys_profile.qdrep"
Report file moved to "/opt/test/nsys_profile.sqlite"

[DLProf-10:01:08] DLprof completed system call successfully
[DLProf-10:01:11] Initializing Nsight Systems database
[DLProf-10:01:52] Error Occurred:
[DLProf-10:01:52] unrecognized token: "0xFFFFFF"
        Query: UPDATE CUPTI_ACTIVITY_KIND_RUNTIME SET kernel_name = (SELECT kernel_name FROM CUPTI_ACTIVITY_KIND_KERNEL WHERE CUPTI_ACTIVITY_KIND_KERNEL.correlationId = CUPTI_ACTIVITY_KIND_RUNTIME.correlationId AND (CUPTI_ACTIVITY_KIND_KERNEL.globalPid  >> 24 & 0xFFFFFF) = (CUPTI_ACTIVITY_KIND_RUNTIME.globalTid  >> 24 & 0xFFFFFF));


[test]# dlprof --database=nsys_profile.sqlite
[DLProf-10:04:50] Creating SQLite Reader Scheduler
[DLProf-10:04:50] Initializing Nsight Systems database
[DLProf-10:04:51] Error Occurred:
[DLProf-10:04:51] unrecognized token: "0xFFFFFF"
        Query: UPDATE CUPTI_ACTIVITY_KIND_RUNTIME SET kernel_name = (SELECT kernel_name FROM CUPTI_ACTIVITY_KIND_KERNEL WHERE CUPTI_ACTIVITY_KIND_KERNEL.correlationId = CUPTI_ACTIVITY_KIND_RUNTIME.correlationId AND (CUPTI_ACTIVITY_KIND_KERNEL.globalPid  >> 24 & 0xFFFFFF) = (CUPTI_ACTIVITY_KIND_RUNTIME.globalTid  >> 24 & 0xFFFFFF));

Any idea why?

thanks
Eyal

Old versions of sqlite3 did not support hex numbers in queries. Please update your version of sqlite3 to at least 3.8.6 and try again.

Out of curiosity, did you run inside of an NGC container or did you manually pip install dlprof?

Thanks @tgerdes ! i’ve manually played with the dlprof inside one of our internal dockers.
Can you please point me a bit, as to how to safely upgrad the sqlite3 to this version?

thanks
Eyal

Depends on what operating system you are running. Assuming it is ubuntu it should just be something like this:

sudo apt-get install sqlite3

Hi @tgerdes ,
I believe I’ve updated sqlite to 3.37
Now I’m getting this:
[DLProf-19:16:34] Error Occurred:
[DLProf-19:16:34] Nsight Systems did not detect any NVTX traces. Please check your script and try again.

I’m running the it like this (in a .sh script):
nohup dlprof --mode=tensorflow1 python -m train_entry $@

nsys --version
NVIDIA Nsight Systems version 2021.2.4.12-a25c8fd

dlprof --version
NVIDIA (R) Deep Learning Profiler
Copyright (c) 2019-2021 NVIDIA Corporation
v1.3.0 / r21.07 built on 2021-06-28 09:54:05 (Build 24377977)

nvidia-smi
Tue Aug 31 20:25:52 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+

Any idea why the dlprof_dldb.sqlite didn’t get created?

thanks
Eyal

DLProf for TF1 requires a custom nvidia build of tensorflow1. That is what comes inside of the DLFW container. If you are installing DLProf using pip, you would want to do:
pip install nvidia-dlprof[tensorflow]

Hi @tgerdes ,
I’ve used the pip install nvidia-dlprof[tensorflow] extention.

Still getting the same thing:

[DLProf-04:58:56] RUNNING: TF_ENABLE_NVTX_RANGES=1 TF_FORCE_GPU_ALLOW_GROWTH=true TF_ENABLE_NVTX_RANGES_DETAILED=1 nsys profile -t cuda,nvtx -s none --show-output=true --force-overwrite=true --export=sqlite -o ./nsys_profile python -m mytrain …

Processing events…
Saving temporary “/tmp/nsys-report-e107-1c1a-e75c-17ba.qdstrm” file to disk…

Creating final output files…
Processing [==============================================================100%]
Saved report file to “/tmp/nsys-report-e107-1c1a-e75c-17ba.qdrep”
Exporting 47280133 events: [==============================================100%]

Exported successfully to
/tmp/nsys-report-e107-1c1a-e75c-17ba.sqlite
Report file moved to “/opt/taboola/./nsys_profile.qdrep”
Report file moved to “/opt/taboola/./nsys_profile.sqlite”

[DLProf-05:45:40] DLprof completed system call successfully
[DLProf-05:45:45] Initializing Nsight Systems database
[DLProf-05:46:21] Reading System Information from Nsight Systems database
[DLProf-05:46:21] Reading Domains from Nsight Systems database
[DLProf-05:46:21] Error Occurred:
[DLProf-05:46:21] Nsight Systems did not detect any NVTX traces. Please check your script and try again.

And no dlprof_dldb.sqlite file…

Further assistance/ideas would be greatly appreciated

thanks
Eyal

@tgerdes , I must be doing something wrong here, or don’t understand something…
I’ve taken a small test to a simple docker :

sudo docker run --gpus all -it --rm -v `pwd`:/workspace/Downloads nvcr.io/nvidia/tensorflow:21.08-tf1-py

Dlprof was already there -

root@22d520b6085b:/workspace# dlprof --version
NVIDIA (R) Deep Learning Profiler for Tensorflow 1.x
Copyright (c) 2019-2021 NVIDIA Corporation
v1.4.0 / r21.08 built on 2021-07-23 13:34:33 (Build 25191329)

Used this .py file:

import tensorflow as tf
sess = tf.Session()
with tf.device('/gpu:0'):
	a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
	b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
	c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print (sess.run)

And ran it:

reset ; dlprof --force=true --mode=tensorflow1 python  a.py

[DLProf-06:13:32] Creating Nsys Scheduler
[DLProf-06:13:32] RUNNING: TF_ENABLE_NVTX_RANGES=1 TF_FORCE_GPU_ALLOW_GROWTH=true TF_ENABLE_NVTX_RANGES_DETAILED=1 nsys profile -t cuda,nvtx -s none --show-output=true --force-overwrite=true --export=sqlite -o ./nsys_profile python a.py
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
WARNING: Backtraces will not be collected because sampling is disabled.
Collecting data...
2021-09-01 06:13:33.078979: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:From a.py:2: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-09-01 06:13:34.023488: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-01 06:13:34.066685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.066991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties: 
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.067011: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.070139: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.071590: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.071787: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.072189: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.072855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.072963: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.073049: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.073387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.073653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.100390: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2021-09-01 06:13:34.101176: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f327d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-09-01 06:13:34.101203: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-09-01 06:13:34.233960: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.234336: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2b2c5a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-09-01 06:13:34.234355: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro RTX 3000, Compute Capability 7.5
2021-09-01 06:13:34.234509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.234801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties: 
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.234821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.234836: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.234846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.234855: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.234864: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.234873: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.234884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.234942: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.235254: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.235521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.235544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.545287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-01 06:13:34.545318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-09-01 06:13:34.545324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-09-01 06:13:34.545501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.545882: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.546165: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-09-01 06:13:34.546191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4096 MB memory) -> physical GPU (device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From a.py:7: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-09-01 06:13:34.549349: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.549628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1666] Found device 0 with properties: 
name: Quadro RTX 3000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2021-09-01 06:13:34.549649: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-01 06:13:34.549665: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-01 06:13:34.549674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-01 06:13:34.549684: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-01 06:13:34.549693: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-09-01 06:13:34.549704: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-01 06:13:34.549714: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-01 06:13:34.549773: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1794] Adding visible gpu devices: 0
2021-09-01 06:13:34.550360: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-01 06:13:34.550365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212]      0 
2021-09-01 06:13:34.550369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0:   N 
2021-09-01 06:13:34.550449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.550756: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-01 06:13:34.551026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4096 MB memory) -> physical GPU (device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5
2021-09-01 06:13:34.551064: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Quadro RTX 3000, pci bus id: 0000:01:00.0, compute capability: 7.5

<bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f1cc7b29e50>>
Processing events...
Saving temporary "/tmp/nsys-report-d4d5-4bf9-4897-eee3.qdstrm" file to disk...

Creating final output files...
Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-d4d5-4bf9-4897-eee3.qdrep"
Exporting 442 events: [===================================================100%]

Exported successfully to
/tmp/nsys-report-d4d5-4bf9-4897-eee3.sqlite
Report file moved to "/workspace/./nsys_profile.qdrep"
Report file moved to "/workspace/./nsys_profile.sqlite"

[DLProf-06:13:35] DLprof completed system call successfully
2021-09-01 06:13:35.812733: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
[DLProf-06:13:37] Initializing Nsight Systems database
[DLProf-06:13:37] Reading System Information from Nsight Systems database
[DLProf-06:13:37] Reading Domains from Nsight Systems database
[DLProf-06:13:37] Error Occurred:
[DLProf-06:13:37] Nsight Systems did not detect any NVTX traces.  Please check your script and try again.

I’ve been using CUDA for the last 12 years, I’ve never had so much issues with something like this :(

root@7c299394d3bd:/workspace/Downloads# cat a.py 
import tensorflow as tf
import nvtx

@nvtx.annotate("test", color="purple")
def test():
    sess = tf.Session()
    with tf.device('/gpu:0'):
        a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
        b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
        c = tf.matmul(a, b)
    sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
    print (sess.run)

if __name__ == '__main__':
    test()

And running it on the docker image:

dlprof --force=true --mode=tensorflow1 python -m a
....

<bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7f70d9d425b0>>
Processing events...
Saving temporary "/tmp/nsys-report-4a0f-0638-b77e-1e89.qdstrm" file to disk...

Creating final output files...
Processing [==============================================================100%]
Saved report file to "/tmp/nsys-report-4a0f-0638-b77e-1e89.qdrep"
Exporting 445 events: [===================================================100%]

Exported successfully to
/tmp/nsys-report-4a0f-0638-b77e-1e89.sqlite
Report file moved to "/workspace/Downloads/./nsys_profile.qdrep"
Report file moved to "/workspace/Downloads/./nsys_profile.sqlite"

[DLProf-06:57:12] DLprof completed system call successfully
2021-09-01 06:57:13.148096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
[DLProf-06:57:14] Initializing Nsight Systems database
[DLProf-06:57:14] Reading System Information from Nsight Systems database
[DLProf-06:57:14] Reading Domains from Nsight Systems database
[DLProf-06:57:14] Reading Ops from Nsight Systems database
[DLProf-06:57:14] Error Occurred:
[DLProf-06:57:14] map::at

I’m sorry you are having so much trouble :(

I’m not a TF expert, but it seems like sess.run needs to be sess.run(c) to actually execute the code? Also, don’t add manual nvtx markers.

This worked for me. I only edited the last line

import tensorflow as tf
sess = tf.Session()
with tf.device('/gpu:0'):
	a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
	b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
	c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print (sess.run(c))