DLProf issue

While I’m trying to use dlprof to profile a simple Pytorch task. I got such an error:
#dlprof --mode pytorch -f true python dltest.py
[DLProf-06:34:55] Creating Nsys Scheduler
[DLProf-06:34:55] RUNNING: nsys profile -t cuda,nvtx -s none --show-output=true --force-overwrite=true --export=sqlite -o ./nsys_profile python dltest.py
Collecting data…
=====YES
Processing events…
Saving temporary “/tmp/nsys-report-38e3-4bee-d9be-c8a9.qdstrm” file to disk…

Creating final output files…
Processing [===============================================================100%]
Saved report file to “/tmp/nsys-report-38e3-4bee-d9be-c8a9.qdrep”
Exporting 846911 events: [=================================================100%]

Exported successfully to
/tmp/nsys-report-38e3-4bee-d9be-c8a9.sqlite
Report file moved to “/home/admin/hippo/worker/slave/xdl-456f8f0b1a83_xdl-456f8f0b1a83-worker_S313538_16_23/binary/xdl_python_package/./nsys_profile.qdrep”
Report file moved to “/home/admin/hippo/worker/slave/xdl-456f8f0b1a83_xdl-456f8f0b1a83-worker_S313538_16_23/binary/xdl_python_package/./nsys_profile.sqlite”

[DLProf-06:35:18] DLprof completed system call successfully
[DLProf-06:35:20] Initializing Nsight Systems database
[DLProf-06:35:20] Error Occurred:
[DLProf-06:35:20] near “(”: syntax error
Query: SELECT Count(*) FROM pragma_table_info(‘CUPTI_ACTIVITY_KIND_KERNEL’) WHERE name = ‘kernel_name’;
(python3.6.13)

Appreciate for any suggestion

Here is my training script:

import torch
import torchvision.models
import mdl

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = torchvision.models.resnet18()
net.to(device)
net.train()
optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


for i in range(10):
    x_l = torch.randn(10, 3, 224, 224)
    net(x_l.to(device))
    optimizer.step()

print('=====YES')

Hello.

We’ve been looking into this, sorry we have not been keeping you in the loop.

Do you know what version of DLProf you are using? I am trying to isolate the version of Nsight Systems it is running under the covers, as it looks like that is where the error is.

Thanks.

Hi,

I installed DLProf with:
pip install nvidia-dlprof[pytorch]
It’s version is 1.8.0, running on RHEL 7.
I think the Nsys would be the one installed with DLProf

[admin@9974a7e650a2 /]
$which nsys
/opt/conda/envs/python3.6.13/bin/nsys
(python3.6.13)
[admin@9974a7e650a2 /]
$nsys --version
NVIDIA Nsight Systems version 2021.3.2.12-9700a21
(python3.6.13)

I tried to execute the command that reported to be error like this:

import sqlite3
conn = sqlite3.connect('nsys_profile.sqlite')
c = conn.cursor()
c.execute("SELECT * FROM pragma_table_info('CUPTI_ACTIVITY_KIND_KERNEL') WHERE name = 'kernel_name'")
print(c.fetchall())

It works fine and get the result:
[(0,)]

Then I printed out all the columns of this table:

import sqlite3
conn = sqlite3.connect('nsys_profile.sqlite')
c = conn.cursor()
c.execute("PRAGMA table_info('CUPTI_ACTIVITY_KIND_KERNEL')")
print(c.fetchall())

I got :
[(0, 'start', 'INTEGER', 1, None, 0), (1, 'end', 'INTEGER', 1, None, 0), (2, 'deviceId', 'INTEGER', 1, None, 0), (3, 'contextId', 'INTEGER', 1, None, 0), (4, 'streamId', 'INTEGER', 1, None, 0), (5, 'correlationId', 'INTEGER', 0, None, 0), (6, 'globalPid', 'INTEGER', 0, None, 0), (7, 'demangledName', 'INTEGER', 1, None, 0), (8, 'shortName', 'INTEGER', 1, None, 0), (9, 'mangledName', 'INTEGER', 0, None, 0), (10, 'launchType', 'INTEGER', 0, None, 0), (11, 'cacheConfig', 'INTEGER', 0, None, 0), (12, 'registersPerThread', 'INTEGER', 1, None, 0), (13, 'gridX', 'INTEGER', 1, None, 0), (14, 'gridY', 'INTEGER', 1, None, 0), (15, 'gridZ', 'INTEGER', 1, None, 0), (16, 'blockX', 'INTEGER', 1, None, 0), (17, 'blockY', 'INTEGER', 1, None, 0), (18, 'blockZ', 'INTEGER', 1, None, 0), (19, 'staticSharedMemory', 'INTEGER', 1, None, 0), (20, 'dynamicSharedMemory', 'INTEGER', 1, None, 0), (21, 'localMemoryPerThread', 'INTEGER', 1, None, 0), (22, 'localMemoryTotal', 'INTEGER', 1, None, 0), (23, 'gridId', 'INTEGER', 1, None, 0), (24, 'sharedMemoryExecuted', 'INTEGER', 0, None, 0), (25, 'graphNodeId', 'INTEGER', 0, None, 0), (26, 'sharedMemoryLimitConfig', 'INTEGER', 0, None, 0)]
No column named kernel_name

@jkreibich can you take a look at this when you return from holiday?

As indicated in the schema reference, the CUPTI_ACTIVITY_KIND_KERNEL table does not have a kernel_name column. The first result shown here ("[(0,)]") is not a valid result row for this query. I’m not sure if this is a Python quirk or an SQLite version one, but if I hand-type this query into my version of SQLite, then that query returns no rows (as it should). It might be worth checking .rowcount of the query result before executing the .fetchall().

If you want the name of the kernel, you must JOIN the CUPTI_ACTIVITY_KIND_KERNEL table to the StringIds table, using either the shortName, demangledName, or mangledName column, depending on which name format you want.

Hi,

Thanks a lot for your reply!

As you mentioned:

I’m not sure if this is a Python quirk or an SQLite version one

I also think that might be a SQLite version problem. I tried to reinstall the SQLite one the system but no luck. I do not know how does DLProf call the SQLite or Python.

I want to know how can I get DLProf work on my system. Any further suggestion?

In regards to this error:

[DLProf-06:35:20] Error Occurred:
[DLProf-06:35:20] near “(”: syntax error
Query: SELECT Count(*) FROM pragma_table_info(‘CUPTI_ACTIVITY_KIND_KERNEL’) WHERE name = ‘kernel_name’;
(python3.6.13)

This syntax error is likely a versioning issue. The FROM pragma_table_info() syntax is for an “eponymous virtual table,” or a function that returns a table value. While eponymous tables were introduced in SQLite 3.9 (Oct 2015), the build-in pragma functions were not added until SQLite 3.16 (Jan 2017). Although Python 3.6.13 was released in 2021, it was a security release. The Python 3.6 series was first released in Dec 2016, before SQLite introduced pragma functions, and I believe Python’s update and compatibility policy means the SQLite library used to build the built-in Python sqlite3 module would not have been updated throughout the 3.6 series.

In short, the version of Python, and therefore SQLite, that is being used to run this SQL is too old, and this syntax is causing the error.

Thanks, I will try a higher version of Python.

I upgrade to Python3.8 but still the same error…

It is really an issue of sqlite3, not pip. What version of sqlite3 do you have? You will want to update sqlite3 if you want this to work.

However, I should point out that DLProf is no longer in development. If you are looking for a longer term solution you should use the native pytorch profiler (kineto).

Thanks