DLProf error during report generation

Hi,

I am using a system with 4 A100s, the operating system is Ubuntu 20.04LTS, CUDA version 11.8 and a driver version 525.89.02.

I intend to use DLProf to profile a MONAI training. I am launching the training (so far on a single GPU) as:

dlprof --mode pytorch --reports summary --formats json --output_path ./outputs_base python3 Dense_UNet_Training_v0.0.py*

The training completes OK, but errors are generated by DLprof and I cannot open the log files with NSYS Compute.

Specifically the error is:

Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::InvalidArgumentException\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown driver API function index: 673\n”
}
}
}
}

the above would seem to indicate a mismatch between the API of the driver and of DLProf?

Do you have any suggestion?

Thanks for any help,

Andrea

Can you tell me the version numbers of Nsight Systems and DLProf that you are using?

Hi,

DLPfor is version 1.8.0 and nsys is version 2021.3.2.12-9700a21

Thanks for any suggestion,

Andrea

Unfortunately DLProf has been end-of-lifed, and I think you are dealing with a mismatch due to that.

Under the covers DLProf got its data from Nsight Systems and then performed some transforms on it, so my recommendation is that you get the latest Nsight Systems, available from https://developer.nvidia.com/nsight-systems/get-started and try profiling with that.

Thanks for the note. It is bad news that DLProf is being discontinued.

Just to make sure I understand correctly your message: are you suggesting that I get the latest nsys and see if I am lucky enough that the current DLProf will work with it, or are you meaning to just use nsys as a substitute?

Thanks for any clarification !

Andrea

I’m suggesting that you try Nsight Systems directly.

We can help you figure out what options and statistical analysis get you the information you need, even if the data layout will not be the same as what you got from dlprof.