Cannot connect to process in nsight compute

Hi. I am trying to profile an application in Nsight Compute. The configuration is


After I click the launch button, the log shows that

and it’s just stuck there without response.

I also tried to use ncu directly in the terminal, and here is the situation:

What happend to my system? BTW, if I directly run the script it works fine.

It seems like there is a profiling BUG related to tensorrt.

I commented out all codes except the one line import torch_tensorrt, and I use ncu python test.py to profile it. Still, it does not work (==ERROR== Failed to connect to process 811792).

Version of my packages:

python: 3.11
torch_tensorrt: 2.2.0
pytorch: 2.2.1+cu121

Hi, @zyk0126

Sorry for the issue you met. Can you tell which NCU version is used ?
Also is it possible to share the script you used, so we can try to reproduce internally ?

Hi, with ncu --version, I got

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.3.1.0 (build 33474944) (public-release)

The python environment is

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
asttokens                 2.0.5              pyhd3eb1b0_0    defaults
backcall                  0.2.0              pyhd3eb1b0_0    defaults
blas                      1.0                         mkl    defaults
brotli-python             1.0.9           py311h6a678d5_7    defaults
bzip2                     1.0.8                h7b6447c_0    defaults
ca-certificates           2023.08.22           h06a4308_0    defaults
certifi                   2023.11.17      py311h06a4308_0    defaults
cffi                      1.16.0          py311h5eee18b_0    defaults
charset-normalizer        2.0.4              pyhd3eb1b0_0    defaults
coloredlogs               15.0.1                   pypi_0    pypi
comm                      0.1.2           py311h06a4308_0    defaults
cryptography              41.0.3          py311hdda0065_0    defaults
cuda-cudart               12.1.105                      0    nvidia
cuda-cupti                12.1.105                      0    nvidia
cuda-libraries            12.1.0                        0    nvidia
cuda-nvrtc                12.1.105                      0    nvidia
cuda-nvtx                 12.1.105                      0    nvidia
cuda-opencl               12.3.101                      0    nvidia
cuda-runtime              12.1.0                        0    nvidia
debugpy                   1.6.7           py311h6a678d5_0    defaults
decorator                 5.1.1              pyhd3eb1b0_0    defaults
executing                 0.8.3              pyhd3eb1b0_0    defaults
ffmpeg                    4.3                  hf484d3e_0    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
filelock                  3.13.1          py311h06a4308_0    defaults
flatbuffers               23.5.26                  pypi_0    pypi
freetype                  2.12.1               h4a9f257_0    defaults
fsspec                    2024.2.0                 pypi_0    pypi
giflib                    5.2.1                h5eee18b_3    defaults
gmp                       6.2.1                h295c915_3    defaults
gmpy2                     2.1.2           py311hc9b5ff0_0    defaults
gnutls                    3.6.15               he1e5248_0    defaults
humanfriendly             10.0                     pypi_0    pypi
idna                      3.4             py311h06a4308_0    defaults
intel-openmp              2023.1.0         hdb19cb5_46306    defaults
ipykernel                 6.25.0          py311h92b7b1e_0    defaults
ipython                   8.15.0          py311h06a4308_0    defaults
jedi                      0.18.1          py311h06a4308_1    defaults
jinja2                    3.1.2           py311h06a4308_0    defaults
jpeg                      9e                   h5eee18b_1    defaults
jupyter_client            8.6.0           py311h06a4308_0    defaults
jupyter_core              5.5.0           py311h06a4308_0    defaults
lame                      3.100                h7b6447c_0    defaults
lcms2                     2.12                 h3be6417_0    defaults
ld_impl_linux-64          2.38                 h1181459_1    defaults
lerc                      3.0                  h295c915_0    defaults
libcublas                 12.1.0.26                     0    nvidia
libcufft                  11.0.2.4                      0    nvidia
libcufile                 1.8.1.2                       0    nvidia
libcurand                 10.3.4.101                    0    nvidia
libcusolver               11.4.4.55                     0    nvidia
libcusparse               12.0.2.55                     0    nvidia
libdeflate                1.17                 h5eee18b_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libiconv                  1.16                 h7f8727e_2    defaults
libidn2                   2.3.4                h5eee18b_0    defaults
libjpeg-turbo             2.0.0                h9bf148f_0    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
libnpp                    12.0.2.50                     0    nvidia
libnvjitlink              12.1.105                      0    nvidia
libnvjpeg                 12.1.1.14                     0    nvidia
libpng                    1.6.39               h5eee18b_0    defaults
libsodium                 1.0.18               h7b6447c_0    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libtasn1                  4.19.0               h5eee18b_0    defaults
libtiff                   4.5.1                h6a678d5_0    defaults
libunistring              0.9.10               h27cfd23_0    defaults
libuuid                   1.41.5               h5eee18b_0    defaults
libwebp                   1.3.2                h11a3e52_0    defaults
libwebp-base              1.3.2                h5eee18b_0    defaults
llvm-openmp               14.0.6               h9e868ea_0    defaults
lz4-c                     1.9.4                h6a678d5_0    defaults
markupsafe                2.1.1           py311h5eee18b_0    defaults
matplotlib-inline         0.1.6           py311h06a4308_0    defaults
mkl                       2023.1.0         h213fc3f_46344    defaults
mkl-service               2.4.0           py311h5eee18b_1    defaults
mkl_fft                   1.3.8           py311h5eee18b_0    defaults
mkl_random                1.2.4           py311hdb19cb5_0    defaults
mpc                       1.1.0                h10f8cd9_1    defaults
mpfr                      4.0.2                hb69a4c5_1    defaults
mpmath                    1.3.0           py311h06a4308_0    defaults
ncurses                   6.4                  h6a678d5_0    defaults
nest-asyncio              1.5.6           py311h06a4308_0    defaults
nettle                    3.7.3                hbbd107a_1    defaults
networkx                  3.1             py311h06a4308_0    defaults
numpy                     1.26.2          py311h08b1b3b_0    defaults
numpy-base                1.26.2          py311hf175353_0    defaults
nvidia-cublas-cu12        12.3.4.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.7.29                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.19.3                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.101                 pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
onnx                      1.15.0                   pypi_0    pypi
onnxruntime-gpu           1.17.1                   pypi_0    pypi
openh264                  2.1.1                h4ff587b_0    defaults
openjpeg                  2.4.0                h3ad879b_0    defaults
openssl                   3.0.12               h7f8727e_0    defaults
packaging                 23.1            py311h06a4308_0    defaults
parso                     0.8.3              pyhd3eb1b0_0    defaults
pexpect                   4.8.0              pyhd3eb1b0_3    defaults
pickleshare               0.7.5           pyhd3eb1b0_1003    defaults
pillow                    10.0.1          py311ha6cbd5a_0    defaults
pip                       23.3.1          py311h06a4308_0    defaults
platformdirs              3.10.0          py311h06a4308_0    defaults
prompt-toolkit            3.0.36          py311h06a4308_0    defaults
protobuf                  4.25.3                   pypi_0    pypi
psutil                    5.9.0           py311h5eee18b_0    defaults
ptyprocess                0.7.0              pyhd3eb1b0_2    defaults
pure_eval                 0.2.2              pyhd3eb1b0_0    defaults
pycparser                 2.21               pyhd3eb1b0_0    defaults
pygments                  2.15.1          py311h06a4308_1    defaults
pyopenssl                 23.2.0          py311h06a4308_0    defaults
pysocks                   1.7.1           py311h06a4308_0    defaults
python                    3.11.5               h955ad1f_0    defaults
python-dateutil           2.8.2              pyhd3eb1b0_0    defaults
pytorch-cuda              12.1                 ha16c6d3_5    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
pytorch-mutex             1.0                        cuda    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
pyyaml                    6.0.1           py311h5eee18b_0    defaults
pyzmq                     25.1.0          py311h6a678d5_0    defaults
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0          py311h06a4308_0    defaults
setuptools                68.0.0          py311h06a4308_0    defaults
six                       1.16.0             pyhd3eb1b0_1    defaults
sqlite                    3.41.2               h5eee18b_0    defaults
stack_data                0.2.0              pyhd3eb1b0_0    defaults
sympy                     1.11.1          py311h06a4308_0    defaults
tbb                       2021.8.0             hdb19cb5_0    defaults
tensorrt                  8.6.1.post1              pypi_0    pypi
tensorrt-bindings         8.6.1                    pypi_0    pypi
tensorrt-libs             8.6.1                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
torch                     2.2.1                    pypi_0    pypi
torch-tensorrt            2.2.0                    pypi_0    pypi
torchaudio                2.1.1               py311_cu121    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
torchvision               0.16.1              py311_cu121    http://mirror.nju.edu.cn/anaconda/cloud/pytorch
tornado                   6.3.3           py311h5eee18b_0    defaults
traitlets                 5.7.1           py311h06a4308_0    defaults
triton                    2.2.0                    pypi_0    pypi
typing-extensions         4.10.0                   pypi_0    pypi
tzdata                    2023c                h04d1e81_0    defaults
urllib3                   1.26.18         py311h06a4308_0    defaults
wcwidth                   0.2.5              pyhd3eb1b0_0    defaults
wheel                     0.41.2          py311h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
yaml                      0.2.5                h7b6447c_0    defaults
zeromq                    4.3.4                h2531618_0    defaults
zlib                      1.2.13               h5eee18b_0    defaults
zstd                      1.5.5                hc292b87_0    defaults

The python script is

import numpy as np
import os


# PyTorch imports
import torch
import torchvision
import torch_tensorrt


input_shape = (1, 3, 1024, 1024)
model_name = "densenet121"
dev = torch.device('cuda')
libpath = "./data/" + model_name + "-in" + "-" + str(input_shape[0]) + "_" + str(input_shape[1]) + "_" + str(input_shape[2]) + "_" + str(input_shape[3]) + "-" + "trt-model.ts"


# https://pytorch.org/vision/0.8/models.html
model = getattr(torchvision.models, model_name)(pretrained=True)
model = model.to(dev)
model = model.eval()


# optimize
if not os.path.exists(libpath):
    print("First use, need to compile.")
    # Enabled precision for TensorRT optimization
    enabled_precisions = {torch.float}
    inputs = [torch.randn(input_shape, device = dev)]
    trt_gm = torch_tensorrt.compile(model, "dynamo", inputs) # Output is a torch.fx.GraphModule
    # Transform and create an exported program
    # https://pytorch.org/TensorRT/user_guide/saving_models.html
    trt_ts = torch_tensorrt.compile(model, "ts", inputs) # Output is a ScriptModule object
    torch.jit.save(trt_ts, libpath)


# Later, you can load it and run inference
model = torch.jit.load(libpath)
input_data = torch.randn(input_shape, device = dev)
output_data = model(input_data)
print(output_data)

Run the file directly at the first time to optimize the model, and then use ncu to load the optimized model and do the inference, where you can reproduce the error.