Using QdstrmImporter

When you run Nsight Systems using the command line, the CLI generates a .qdstrm file.

This .qdstrm file is an intermediate result file, not intended for multiple imports. It needs to be processed, either by importing it into the GUI or by using the standalone QdstrmImporter to generate an optimized .qdrep file. Use this .qdrep file when re-opening the result on the same machine, opening the result on a different machine, or sharing results with teammates.

The import of really large, multi-gigabyte, .qdstrm files may take up all of the memory on the host computer and lock up the system. We are working to improve this, but many users are working with setups where the target computer is much more powerful than the host they are using for visualization.

Using QdstrmImporter gives you the ability to script the .qdrep generation (on the host) or generate the .qdrep file where you have the most resources. Note that the CLI and QdstrmImporter versions must match to convert a .qdstrm file into a .qdrep file. This .qdrep file can then be opened in the same verion or more recent versions of the GUI.

To run QdstrmImporter on the host system, find the QdstrmImporter binary in the Host-x86_64 directory in your installation. QdstrmImporter is available for all host platforms. See options below.

To run QdstrmImporter on the target system, copy the Linux Host-x86_64 directory to the target Linux system or install Nsight Systems for Linux host directly on the target. The Windows or MacOS host QdstrmImporter will not work on a Linux Target. See options below.

QdstrmImporter Options:
-h or --help ---- Help message providing information about available options and their parameters.
-v or --version ---- Output QdstrmImporter version information
-i or --input-file [filename or path] ---- Import .qdstrm file and generate a .qdrep file with the same name and in the same location.

Hi, I encountered this error importing a 800MB qdstrm file.

Importing...
Import Failed with unexpected exception: /build/agent/work/20a3cfcd1c25021d/QuadD/Host/QdstrmImporter/main.cpp(36): Throw in function void {anonymous}::RunImport(boost::filesystem::path)
Dynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::RuntimeException>
std::exception::what: RuntimeException
[QuadDCommon::tag_error_text*] = Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/SymbolAnalyzer/SymbolAnalyzer.cpp(216): Throw in function virtual QuadDSymbolAnalyzer::SymbolInfoLight QuadDSymbolAnalyzer::SymbolAnalyzer::RemoteResolveSymbol(QuadDCommon::TransferrableProcessId, const QuadDTimestamp&, uint64_t, bool)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::TimeoutException>\nstd::exception::what: TimeoutException\n[QuadDCommon::tag_error_text*] = Wait time to resolve symbol expired\n"
      }
    }
  }
}

There are two things that may have happened here.

  1. Is this qdstrmImporter from the same version of the tool that the collection is run? The .qdstrm format is a temporary format and not backwards/forwards compatible. It should be converted (by the script or by importing into the GUI) using the same version of the product.

  2. That is a big .qdstrm file, is it possible that your system ran out of memory while processing the file? This can especially happen if you do a long collection on a big system (DGX or cluster node) and try to process on a smaller system. How long was the run/how big was the system?

Hi hwilper,

Thanks for the quick response! To answer your questions:

  1. Yes, they are definitely from the same version of the tool. The installation package filename is NVIDIA_Nsights_Systems_Linux_2018.3.1.29.run

  2. At first, I thought that may be the case when I ran the application on my local machine. But the same error occurs when I run this on the computing server with 256GiB of RAM. Glancing at htop, the memory usage was far from 100% (around ~32GiB total usage, most by the importer) before the importer crashes.

Okay, you are hitting a timeout on the symbol resolution.

Is there any chance that you could put the .qdstrm someplace I could download it from? We’d like to try it on our end. We’d also like the exact CLI command line you used to generate it, if possible.

Alternatively,

  1. Can you run for a sorter duration?
  2. Can you run with fewer trace options (or skip sampling)?
  3. You are hitting the timeout because of the OS runtime trace. Skipping just that trace will probably get you past this.

Hi hwilper,

Thanks for your help!
Unfortunately, due to our company security policies, I can not share the .qestrm file with you.

The (obfuscated) command is
LD_LIBRARY= ./nsys profile -o <filename.qdstrm>
This is the first time I use nsys, so I used default settings.

Also, for your precious suggestions:

Can you run for a sorter duration?
What does “sorter duration” mean and how do I run for one?
Can you run with fewer trace options (or skip sampling)?
Do you mean sample less frequently? I think this is worth a try if there is such flag.
My application is written in plain cuda. I don’t think --trace will be much helpful but I’ll rerun the cli with -t=cuda.
You are hitting the timeout because of the OS runtime trace. Skipping just that trace will probably get you past this.
How do I do that? What trace are you referring to?

Thanks again for the reply!

Okay. First of all “sorter duration” should have been “shorter duration” which would have made more sense …

But, since you are mostly interested in CUDA, I would change your command line so that instead of the default trace (which includes the OS Runtime trace, which is causing your issue) you use:

nsys profile --trace=cuda,nvtx -o my_test_output [application-arguments]

It will trace all the CUDA APIs (on the CPU and GPU) and do normal sampling.

Hi mengda.yang,

The CLI command you have used launches the application and profiles until the app exits.
./nsys profile -o <filename.qdstrm>
You are relying on the default CLI options to profile your application here. By default, the CLI traces CUDA, OpenGL, NVTX, and osrt. Also, the CPU sampling is turned on by default. To see the list of CLI options and the defaults, see the output of ./nsys profile --help

In order to profile for a shorter duration, use the --duration=X switch. This was hwilper’s first suggestion.

In order to trace only CUDA APIs, use the --trace=cuda option. This turns off tracing of APIs from all the other libraries. To turn off CPU sampling, use the --sample=none. This was hwilper’s second suggestion.