Nsys Importation error

Hi,

I am new to nsys so just start with the classic profile to test my two-gpu distributed training model. I wrote a simple python file with part of my calculation, the report is successfully created but when I move to the general model to train, there is an importation problem.

*Generating '/tmp/nsys-report-51a3.qdstrm'*
*[1/6] [11%                         ] report10.nsys-rep*
*Importer error status: Importation failed.*
*Import Failed with unexpected exception: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/QdstrmImporter/main.cpp(34): Throw in function {anonymous}::Importer::Importer(const boost::filesystem::path&, const boost::filesystem::path&)*
*Dynamic exception type: boost::wrapexcept<QuadDCommon::RuntimeException>*
*std::exception::what: RuntimeException*
*[QuadDCommon::tag_message*] = Status: AnalysisFailed*
*Error {*
*  Type: RuntimeError*
*  SubError {*
*    Type: InvalidArgument*
*    Props {*
*      Items {*
*        Type: OriginalExceptionClass*
*        Value: "N5boost10wrapexceptIN11QuadDCommon24InvalidArgumentExceptionEEE"*
*      }*
*      Items {*
*        Type: OriginalFile*
*        Value: "/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/EventCollection.cpp"*
*      }*
*      Items {*
*        Type: OriginalLine*
*        Value: "1055"*
*      }*
*      Items {*
*        Type: OriginalFunction*
*        Value: "void QuadDAnalysis::EventCollection::CheckOrder(QuadDAnalysis::EventCollectionHelper::EventContainer&, const QuadDAnalysis::ConstEvent&) const"*
*      }*
*      Items {*
*        Type: ErrorText*
*        Value: "Wrong event order has been detected when adding events to the collection:\nnew event ={ StartNs=46074551461 StopNs=46074555849 GlobalId=283992901438016 Event={ TraceProcessEvent=[{ Correlation=13351336 EventClass=0 TextId=69 ReturnValue=0 },] } Type=48 }\nlast event ={ StartNs=46418590268 StopNs=46418605526 GlobalId=283992901438016 Event={ TraceProcessEvent=[{ Correlation=13493697 EventClass=0 TextId=69 ReturnValue=0 },] } Type=48 }"*
*      }*
*    }*
*  }*
*}*


There is a created qdstrm file but cannot be imported by GUI manually, even if I have tried to move to another Windows device with same version to open, there is another error: The report was possibly created with a newer version of NVIDIA Nsight Systems. Please upgrade to the latest version and try again.

But I think the CPU test works well when I set up the training completely on CPU. So it cannot import the result only when CUDA core is related no matter how many the CUDA core is using. And I tried to add trace--cuda, but there is no difference.

Even though I am using Ubuntu 22.10, but ubuntu 22.04 2024.1.1 is available for some successful case above. And other information is here:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+

Hope to hear from you! Thank you so much for any help and suggestions.
1 Like

Can you tell me what version of Nsys you are using? If it is just the one from the CUDA Toolkit, can you update to the newest version available at developer.nvidia.com/nsight-systems ?

Are you running this from the command line or the GUI? I am assuming the command line, can you get me the command line you generated the .qdstrm file with?

Hi hwilper, I am using the newest version 2024.1. I have used both CLI and GUI actually, both of them report the same error. The command line is to run a training script nsys profile --stats=true ./Train_Ranks_GPU_01.sh
Actually, I think I found a trick to get rid of this error. I manually stopped my training model and the report would be generated successfully. I have no idea why this works but hope this could help someone for future use.

I meet the same problem

@yunduan.lou would it be possible for you to zip up the qdstrm file and get it to us? I’m glad you have found a workaround.

@yinwei_hust can you see if the manual stop solved your issue as well?

It seems to be a problem with the nsight version. After I returned the nsight version to 2023.2.1.12, this problem disappeared. The version I am using now is 2024.1.1

I meet the same problem

@JayLee15 please let me know if updating the version does not fix the issue for you.

Hello @hwilper I tried to run with the latest nsys
(NVIDIA Nsight Systems version 2024.1.1.59-241133802077v0
)
and I have the same issue as well.

Generating ‘/tmp/nsys-report-c9da.qdstrm’
[1/1] [0% ] report1.nsys-rep
Importer error status: Importation failed.
Import Failed with unexpected exception: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/QdstrmImporter/main.cpp(34): Throw in function {anonymous}::Importer::Importer(const boost::filesystem::path&, const boost::filesystem::path&)
Dynamic exception type: boost::wrapexceptQuadDCommon::RuntimeException
std::exception::what: RuntimeException
[QuadDCommon::tag_message*] = Status: AnalysisFailed
Error {
Type: RuntimeError
SubError {
Type: InvalidArgument
Props {
Items {
Type: OriginalExceptionClass
Value: “N5boost10wrapexceptIN11QuadDCommon24InvalidArgumentExceptionEEE”
}
Items {
Type: OriginalFile
Value: “/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/EventCollection.cpp”
}
Items {
Type: OriginalLine
Value: “1055”
}
Items {
Type: OriginalFunction
Value: “void QuadDAnalysis::EventCollection::CheckOrder(QuadDAnalysis::EventCollectionHelper::EventContainer&, const QuadDAnalysis::ConstEvent&) const”
}
Items {
Type: ErrorText
Value: “Wrong event order has been detected when adding events to the collection:\nnew event ={ StartNs=390336878126 StopNs=390336957268 GlobalId=282210137582458 Event={ TraceProcessEvent=[{ Correlation=158781 EventClass=1 TextId=4859 ReturnValue=0 },] } Type=48 }\nlast event ={ StartNs=395670767211 StopNs=395670776062 GlobalId=282210137582458 Event={ TraceProcessEvent=[{ Correlation=226768 EventClass=1 TextId=4884 ReturnValue=0 },] } Type=48 }”
}
}
}
}
Generated:
/work/report1.qdstrm

Any help to mitigate this error would be appreciated. Not sure how to fix: “Wrong event order has been detected when adding events to the collection:”

That was during the profiling run, correct, can you try to post-process the qdstrm file.

Here is the section from the User Guide:

Create .nsys-rep Using QdstrmImporter

The CLI and QdstrmImporter versions must match to convert a .qdstrm file into a .nsys-rep file. This .nsys-rep file can then be opened in the same version or more recent versions of the GUI.

To run QdstrmImporter on the host system, find the QdstrmImporter binary in the Host-x86_64 directory in your installation. QdstrmImporter is available for all host platforms. See options below.

To run QdstrmImporter on the target system, copy the Linux Host-x86_64 directory to the target Linux system or install Nsight Systems for Linux host directly on the target. The Windows or macOS host QdstrmImporter will not work on a Linux Target. See options below.

Short Long Parameter Description
-h --help Help message providing information about available options and their parameters.
-v --version Output QdstrmImporter version information
-i --input-file filename or path Import .qdstrm file from this location.
-o --output-file filename or path Provide a different file name or path for the resulting .nsys-rep file. Default is the same name and path as the .qdstrm file

Thank you @hwilper, sadly I see the same output:

x86-64-396:/work$ /home/rakshith/nsight-systems-2024.1.1/host-linux-x64/QdstrmImporter --input-file report1.qdstrm
Processing [0% ]
Import Failed with unexpected exception: /dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/QdstrmImporter/main.cpp(34): Throw in function {anonymous}::Importer::Importer(const boost::filesystem::path&, const boost::filesystem::path&)
Dynamic exception type: boost::wrapexceptQuadDCommon::RuntimeException
std::exception::what: RuntimeException
[QuadDCommon::tag_message*] = Status: AnalysisFailed
Error {
Type: RuntimeError
SubError {
Type: InvalidArgument
Props {
Items {
Type: OriginalExceptionClass
Value: “N5boost10wrapexceptIN11QuadDCommon24InvalidArgumentExceptionEEE”
}
Items {
Type: OriginalFile
Value: “/dvs/p4/build/sw/devtools/Agora/Rel/QuadD_Main/QuadD/Host/Analysis/Modules/EventCollection.cpp”
}
Items {
Type: OriginalLine
Value: “1055”
}
Items {
Type: OriginalFunction
Value: “void QuadDAnalysis::EventCollection::CheckOrder(QuadDAnalysis::EventCollectionHelper::EventContainer&, const QuadDAnalysis::ConstEvent&) const”
}
Items {
Type: ErrorText
Value: “Wrong event order has been detected when adding events to the collection:\nnew event ={ StartNs=390336878126 StopNs=390336957268 GlobalId=282210137582458 Event={ TraceProcessEvent=[{ Correlation=158781 EventClass=1 TextId=4859 ReturnValue=0 },] } Type=48 }\nlast event ={ StartNs=395670767211 StopNs=395670776062 GlobalId=282210137582458 Event={ TraceProcessEvent=[{ Correlation=226768 EventClass=1 TextId=4884 ReturnValue=0 },] } Type=48 }”
}
}
}
}

Any recommendations?

@skottapalli can you take a deeper look at this one.

Hello There,

Thanks for considering to look into it. Looking forward to hearing.

Hello, what is the output of nvidia-smi command on the target system? We had a recent bug in CUPTI which caused this kind of out-of-order error. It is fixed in 2024.2 version of nsys. Could you try the 2024.2 version of nsys, please?

Sorry, seeing this now. Will try and let you know.

I confirm that the latest version fixed the previous error. I’m able to view a timeline.

Thanks!