DLProf Error

Hi,

I’m trying to profile a TensorFlow 1 model using DLProf but I get the following error:

Processing events...
Saving temporary "/tmp/nsys-report-10e3-3013-5c91-3435.qdstrm" file to disk...

Creating final output files...
Processing [===============================================================100%]
Saved report file to "/tmp/nsys-report-10e3-3013-5c91-3435.qdrep"
Exporting 10110116 events: [===============================================100%]

Exported successfully to
/tmp/nsys-report-10e3-3013-5c91-3435.sqlite
Report file moved to "/workspace/text-to-image/./nsys_profile.qdrep"
Report file moved to "/workspace/text-to-image/./nsys_profile.sqlite"

[DLProf-16:00:42] DLprof completed system call successfully
2022-04-23 16:00:42.542032: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
[DLProf-16:00:43] Initializing Nsight Systems database
[DLProf-16:01:06] Reading System Information from Nsight Systems database
[DLProf-16:01:06] Reading Domains from Nsight Systems database
[DLProf-16:01:06] Reading Ops from Nsight Systems database
[DLProf-16:01:46] Reading CUDA API calls from Nsight Systems database
[DLProf-16:02:32] Correlating network models with kernel and timeline data
[DLProf-16:02:32] Found 1 iteration using key_op "global_step"
Iterations: [791640673260]
Aggregating data over 1 iteration: iteration 0 start (0 ns) to iteration 0 end (791640673260 ns)

[DLProf-16:02:32] Aggregating profile data
[DLProf-16:02:57] Creating dlprof database at ./dlprof_dldb_1.sqlite
[DLProf-16:02:57] Writing profile data to dlprof database
[DLProf-16:03:47] Error Occurred:
[DLProf-16:03:47] near "nvtx": syntax error
	Query: INSERT INTO read_only_system_config (   is_valid,    num_gpus,    cpu_model,    driver_version,    framework,    cuda_version,    cudnn_version,    nsys_version,    dlprof_version,    dlprof_build,    profile_name,   mode_enum,   mode_string,   original_command )  VALUES ( 1, '1', '11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz', '510.47.03', 'TensorFlow 1.15.5', '11.6', '8.3.0', '2021.3.2.4-027534f', 'v1.7.0 / r21.11', '28499850', '',  1, 'tensorflow1', 'dlprof --mode=tensorflow1 --nsys_opts=--sample=cpu --trace 'nvtx,cuda,osrt,cudnn' --force=true python train.py');  

I’m using the Tensorflow 1 NGC framework container: nvcr.io/nvidia/tensorflow:21.11-tf1-py3

Any idea how to solve this?
Thanks a lot!

The dlprof database gets created though, but I also get an error when I try to run dlprofviewer ./dlprof_dldb.sqlite

The DLProf database is version 'UNKNOWN' but DLProf is version '1.7.0'. Please install the version of DLProf that matches the database version.
Traceback (most recent call last):
  File "/usr/local/bin/dlprofviewer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/dlprofviewer/__main__.py", line 57, in main
    handle_upversion(args.database)
  File "/usr/local/lib/python3.8/dist-packages/dlprofviewer/__main__.py", line 100, in handle_upversion
    raise RuntimeError('DLProf database upversion failed. Read error above for details.')
RuntimeError: DLProf database upversion failed. Read error above for details.

I managed to reproduce this error using Pytorch as well, I assume it’s because of the nsys_opts argument (probably the "s & 's) because this command produces the above error:

 dlprof --mode=pytorch --nsys_opts="--sample=cpu --trace 'nvtx,cuda,osrt,cudnn'" --force=true python main_dlprof.py

but this command works:

dlprof --mode=pytorch --force=true python main_dlprof.py