Nsight systems doesn't seemed to work correctly

Environment

Hello,
I installed nsight systems in the nvidia docker container - nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3
The nsight systems I installed is nsight-systems-2022.5.2_2022.5.2.120-32316741v0_arm64.deb
I installed nsight systems in the docker container. And I tried to test nsight systems in the container.

I got installer from here.

#include <stdio.h>

/*
 * Host function to initialize vector elements. This function
 * simply initializes each element to equal its index in the
 * vector.
 */

void initWith(float num, float *a, int N)
{
  for(int i = 0; i < N; ++i)
  {
    a[i] = num;
  }
}

/*
 * Device kernel stores into `result` the sum of each
 * same-indexed value of `a` and `b`.
 */

__global__
void addVectorsInto(float *result, float *a, float *b, int N)
{
  int index = threadIdx.x + blockIdx.x * blockDim.x;
  int stride = blockDim.x * gridDim.x;

  for(int i = index; i < N; i += stride)
  {
    result[i] = a[i] + b[i];
  }
}

/*
 * Host function to confirm values in `vector`. This function
 * assumes all values are the same `target` value.
 */

void checkElementsAre(float target, float *vector, int N)
{
  for(int i = 0; i < N; i++)
  {
    if(vector[i] != target)
    {
      printf("FAIL: vector[%d] - %0.0f does not equal %0.0f\n", i, vector[i], target);
      exit(1);
    }
  }
  printf("Success! All values calculated correctly.\n");
}

int main()
{
  const int N = 2<<24;
  size_t size = N * sizeof(float);

  float *a;
  float *b;
  float *c;

  cudaMallocManaged(&a, size);
  cudaMallocManaged(&b, size);
  cudaMallocManaged(&c, size);

  for (int i = 0 ; i < 100 ; ++i) {
    initWith(3, a, N);
    initWith(4, b, N);
    initWith(0, c, N);

    size_t threadsPerBlock;
    size_t numberOfBlocks;

    /*
     * nsys should register performance changes when execution configuration
     * is updated.
     */

    //threadsPerBlock = 1;
    //numberOfBlocks = 1;
    threadsPerBlock = 1024;
    numberOfBlocks = (N + threadsPerBlock - 1) / threadsPerBlock;
    printf("numberOfBlocks : %d\n", numberOfBlocks);

    cudaError_t addVectorsErr;
    cudaError_t asyncErr;

    addVectorsInto<<<numberOfBlocks, threadsPerBlock>>>(c, a, b, N);

    addVectorsErr = cudaGetLastError();
    if(addVectorsErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(addVectorsErr));

    asyncErr = cudaDeviceSynchronize();
    if(asyncErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(asyncErr));

    checkElementsAre(7, c, N);
  }

  cudaFree(a);
  cudaFree(b);
  cudaFree(c);
}

This is the code above.

nvcc -o 01 01.cu

The source code above produed an ELF named 01.

nsys profile --stats=true --force-overwrite=true -o 01-report ./01

This command outputs as shown below:

nsys profile --stats=true --force-overwrite=true -o 01-report ./01
WARNING: ARMv8 PMU is not available, enabling `sampling-trigger=perf` switch, software events will be used for CPU sampling.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.

........................................

numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
numberOfBlocks : 32768
Success! All values calculated correctly.
FATAL ERROR: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Tegra/QuadD/Common/GpuTraits/Src/GpuTicksConverter.cpp(376): Throw in function QuadDCommon::TimestampType GpuTraits::GpuTicksConverter::ConvertToCpuTime(const QuadDCommon::Uuid&, uint64_t&) const
Dynamic exception type: boost::wrapexcept<QuadDCommon::NotFoundException>
std::exception::what: NotFoundException
[QuadDCommon::tag_message*] = No GPU associated to the given UUID

Generating '/tmp/nsys-report-3988.qdstrm'

I think this is a problem.
Something went wrong but I don’t realize what it is.

Could someone help me?

Didn’t I install wrong nsight installer into the wrong place - in the container?

Hi @jhjo, I see you are getting this error “No GPU associated to the given UUID”. I don’t know the cause yet but m investigating to see the reason for this error and potential solutions. I will follow up to let you know.

@jhjo, are you running Jetpack 5.1 or Jetpack 5.1.1? The version of nsight systems you listed seems to be from 5.1. Could you upgrade to 5.1.1 and use this version of nsight systems: nsight-systems-2022.5.2_2022.5.2.171-32559007v0_arm64.deb.

I did confirm that other users experienced that error on Orin with earlier versions of 2022.5.2 and those errors were fixed by 2022.5.2.171 that I listed above. I am hopeful it will fix your issue but it is possible we would need to find a new solution (because the root-cause of your error message is something different)

Please let me know

Also, could you try running the nsys profile command as root (with your current version of nsys)? That should resolve the issue as well.

Hi @tcourtney ,

We now using jetpack 5.1. Our device is Orin 32GB(the dram not the flash). And 5.1.1(35.3.1) is not for 32GB Orin as far as I know. So we cannot use 5.1.1.

And I tried nsys command with sudo and as the root. But the same problem occurs and nothing is different.

I wish the fix for 5.1 be released soon.
How long would it take to be fixed?

Hi @jhjo,

I am curious if you tried to install Jetpack 5.1.1 on your board and encountered an error message, or if you had another reason to believe your board was not supported by 5.1.1.

While I do not have direct experience with Jetson Orin boards, I understand that you have a Jetson AGX Orin 32G board, and I see that board listed as a supported platform for Jetpack 5.1.1 on this page: How to Install JetPack :: NVIDIA JetPack Documentation. Does that look like the same board to you or am I mistaken?

Maybe you could try installing Jetpack 5.1.1 if you haven’t already?

@jhjo,

Can you post the command line that is used to launch the container where you are running nsight systems? If you are not already, can you add --runtime nvidia to the command line when launching the container?

@tcourtney ,

I tried to install BSP based on 5.1.1(35.3.1). but failed. It was not only for the BSP but with 3rd party drivers were included. And I didn’t try to debug the problem because the driver provider provided new driver for 5.1(35.2.1). {space here} I didn’t check why they downgraded the BSP. But from this page, I guessed and thought that 5.1.1(35.3.1). is not supported in Orin 32GB. Would you check the link? “✔*” is for Orin 32GB. And 5.1.1(35.3.1) is only “✔”.
This is the reason why we use 5.1(35.2.1) on Orin 32GB.

I’d like to update my knowledge about the supported Orin device for each jetpack/jetson OS If I misunderstood something. I doubt my understanding based on “✔*” and “✔”.

Even if I am wrong and 5.1.1(35.3.1) is supported for Orin 32GB, I cannot move to 5.1.1(35.3.1) because NVIDIA L4T PyTorch container doesn’t support 5.1.1(35.3.1) until now. The latest tag for it is “nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3”. And we need this container environment. So 5.1(35.2.1) seems the only option we can think of.

docker run --runtime=nvidia
–gpus all
-itd
–privileged
-e USE_GPU_HOST=1
-e NVIDIA_VISIBLE_DEVICES=all
-e NVIDIA_DRIVER_CAPABILITIES=compute,video,graphics,utility
–shm-size 2G

This is docker command we use. Note this is not everything. Not related with our issue are omitted.

Thank you!

@tcourtney ,

I tried with removing of “–gpus all” from the docker run command but the behavior was same. There is no differences.

I tried with “–runtime nvidia” from the docker run command but the behavior was same. There is no differences. (I removed ‘=’ between runtime and nvidia)

I wonder if this problem occurs only for me.
Can’t you reproduce the same problem?

Dear @jhjo,

When you experience the “No GPU associated to the given UUID” error, could you please also confirm which nsys binary is used? This command should show the full path:

readlink -f "$(which nsys)"

Hello @Andrey_Trachenko,

$ readlink -f "$(which nsys)"
/opt/nvidia/nsight-systems/2022.5.2/target-linux-sbsa-armv8/nsys
$ which nsys
/usr/local/bin/nsys

This is the result.
Thanks

Thank you very much.

This looks like a bug on our side. On a Jetson without a dGPU, the following path should be used instead:

/opt/nvidia/nsight-systems/2022.5.2/target-linux-tegra-armv8/nsys
                                                 ^^^^^

We’ll review this internally to make sure that in the future version this issue is fixed.

There are multiple ways to get it fixed. The most correct way is to use the update-alternatives mechanism.

sudo update-alternatives --install /usr/local/bin/nsys nsys /opt/nvidia/nsight-systems/2022.5.2/target-linux-tegra-armv8/nsys 0
sudo update-alternatives --set nsys /opt/nvidia/nsight-systems/2022.5.2/target-linux-tegra-armv8/nsys

After that, please check that readlink -f "$(which nsys)" now points to the tegra variant, not sbsa.

An alternative way to fix (work around) this issue is to install the symlink manually, which is:

sudo ln -sf /opt/nvidia/nsight-systems/2022.5.2/target-linux-tegra-armv8/nsys /usr/local/bin/nsys

Either way should work. Unfortunately, I currently don’t have quick access to a Jetson devkit to verify that it works.

1 Like

Hello @Andrey_Trachenko ,
Yours worked! I tried the first method using update-alternatives and it works!

So the nsys in the container works well. So I am able to get “*-report.nsys-rep” file.
And then I tried nsys-ui with this file like “$ nsys-ui 01-report.nsys-rep” in the docker container and out of the docker container.

The one from the out of the container work well and the Nsight GUI app launched with the profiling data.
The one from the inside of the container didn’t work.
ldd /opt/nvidia/nsight-systems/2022.5.2/host-linux-armv8/Plugins/QuadDPlugin/libQuadDPlugin.so
prints many missing libraries as shown below:

    libboost_atomic.so.1.78.0 => not found
    libboost_chrono.so.1.78.0 => not found
    libboost_date_time.so.1.78.0 => not found
    libboost_filesystem.so.1.78.0 => not found
    libboost_iostreams.so.1.78.0 => not found
    libboost_regex.so.1.78.0 => not found
    libboost_system.so.1.78.0 => not found
    libboost_thread.so.1.78.0 => not found
    libboost_timer.so.1.78.0 => not found
    libboost_program_options.so.1.78.0 => not found
    libboost_serialization.so.1.78.0 => not found
    libboost_container.so.1.78.0 => not found
    libexporter.so => not found
    libTimelineWidget.so => not found
    libInterfaceShared.so => not found
    libInterfaceSharedCore.so => not found
    libAppLib.so => not found
    libQt6WebEngineWidgets.so.6 => not found
    libQt6Concurrent.so.6 => not found
    libQt6StateMachine.so.6 => not found
    libQt6Xml.so.6 => not found
    libAppLibInterfaces.so => not found
    libTimelineUIUtils.so => not found
    libNvQtGui.so => not found
    libQt6Svg.so.6 => not found
    libInterfaceSharedBase.so => not found
    libAnalysis.so => not found
    libHostCommon.so => not found
    libSymbolAnalyzerLight.so => not found
    libSymbolDemangler.so => not found
    libStreamSections.so => not found
    libDeviceProperty.so => not found
    libDevicePropertyProto.so => not found
    libNvtxExtData.so => not found
    libAnalysisData.so => not found
    libGenericHierarchy.so => not found
    libTimelineCommon.so => not found
    libTimelineAssert.so => not found
    libAgentAPI.so => not found
    libProtobufCommClient.so => not found
    libProtobufComm.so => not found
    libSshClient.so => not found
    libssh.so => not found
    libProtobufCommProto.so => not found
    libProcessLauncher.so => not found
    libLinuxPerf.so => not found
    libAnalysisContainersData.so => not found
    libQuiverContainers.so => not found
    libQuiverEvents.so => not found
    libarrow.so.500 => not found
    libCommonProtoStreamSections.so => not found
    libAnalysisProto.so => not found
    libInterfaceData.so => not found
    libInterfaceSharedLoggers.so => not found
    libAssert.so => not found
    libCore.so => not found
    libnvlog.so => not found
    libCommonProtoServices.so => not found
    libprotobuf319-shared.so => not found
    libQt6WebEngineCore.so.6 => not found
    libQt6WebChannel.so.6 => not found
    libQt6Positioning.so.6 => not found
    libQt6PrintSupport.so.6 => not found
    libQt6Widgets.so.6 => not found
    libQt6Quick.so.6 => not found
    libQt6QmlModels.so.6 => not found
    libQt6Qml.so.6 => not found
    libQt6Network.so.6 => not found
    libQt6OpenGL.so.6 => not found
    libQt6Gui.so.6 => not found
    libQt6DBus.so.6 => not found
    libQt6Core.so.6 => not found

May be this library is not for the container? I got this impression from the path “host-linux-armv8”.
May be the container only support target-linux-tegra-armv8?

This problem is not that important. Because docker native nsys-ui works well. So I won’t care. But If you resolve this issue, please let me know.
Thank you very much!

I’m glad to hear that it worked.

nsys-ui is actually a shell script that sets a few environment variables, and then launches the crash reporter process, which in turn finally launches the nsys-ui.bin binary. nsys-ui can work in X and Wayland desktop environments. It needs to be run from an environment where graphical applications can be launched. That means that running the UI within the container is likely not going to work, if the container doesn’t have X or Wayland graphical session activated.

Still, if you do have for example an X server and a VNC session in the container, please refer to this install-dependencies.sh script to install additional dependencies:

https://docs.nvidia.com/nsight-systems/UserGuide/index.html#gui-troubleshooting-root-ubuntu-centos

Hello @Andrey_Trachenko ,

I am sorry. Our talk is not finished. The zombie was strong! ^^;

On this time, I tried to use the session in using nsys like this :

  1. $ nsys launch --session-new first --trace osrt,cuda 01
  2. $ nsys start --session first --force-overwrite true --output 01-nsys-report
    WARNING: --sample=system-wide requires root privileges, disabling.
    WARNING: ‘timer’ backtrace collection trigger will not be used because sampling is disabled.
    WARNING: ‘sched’ backtrace collection trigger will not be used because sampling is disabled.
    /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Tegra/QuadD/Target/quadd_d/quadd_d/jni/KernelModuleServiceImpl.cpp(243): Throw in function void QuadDDaemon::KernelModuleServiceImpl::InitializeInternal()
    Dynamic exception type: boost::wrapexceptQuadDDaemon::KmsModuleInitError
    std::exception::what: KmsModuleInitError
    [boost::errinfo_nested_exception_] =
    Throw in function (unknown)
    Dynamic exception type: boost::wrapexceptQuadDDaemon::QMOpenDeviceFileError
    std::exception::what: QMOpenDeviceFileError
    [QuadDCommon::tag_message
    ] = Permission denied
    [QuadDDaemon::tag_error_code*] = 18
    [QuadDDaemon::tag_throw_file*] = /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Tegra/QuadD/Target/quadd_d/quadd_d/jni/qm.c
    [QuadDDaemon::tag_throw_func*] = get_module_capabilities
    [boost::errinfo_errno_*] = 13, “Permission denied”

make: *** [Makefile:34: nsys_start] Error 1

It seems that nsys requires sudo. So I tried with sudo after reboot the system.

  1. $ sudo nsys launch --session-new first --trace osrt,cuda 01
  2. $ sudo nsys start --session first --force-overwrite true --output 01-nsys-report
    [sudo] password for {my user id}:
    WARNING: ‘timer’ backtrace collection trigger will not be used because sampling is disabled.
    WARNING: ‘sched’ backtrace collection trigger will not be used because sampling is disabled.
    Agent launcher failed.

And I searched install-dependencies.sh from /opt/nvidia/nsight-systems/2022.5.2 and found none. I searched for dependencies and found none.

$ find -name *dependencies*

Please note I am under those environment:

Thank you!

Thank you for reporting this issue. We are looking into it.

My apologies for advising you on install-dependencies.sh which shipped a newer versions of Nsight Systems; you would have to wait for the next JetPack release to pick up a newer version.

Hi, @Andrey_Trachenko ,
O.K. Thank you for the verification.
I will wait new jetpack.
Thanks!

Hi @jhjo,

I have reviewed the situation with the issue related to “nsys launch”/“nsys start”. The problem is now fixed, and the next version of Nsight Systems 2023.2.4 (JetPack 5.1.2) should work correctly. Thank you for reporting the issue.

1 Like

Thank you very much!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.