Hello, this is my first time using Nsight System. I encountered an error after my program execution finished and generated the .nsys-rep file. It’s worth noting that when I run part of my code, I can successfully generate the .nsys-rep file. I haven’t found similar cases online, can you give me some help?
My CUDA version is 12.3, and the Nsight System version is 2023.3.3.42-233333266658v0.
I’m going to start by suggesting that you update your Nsys version, because what you are using is more than a year old.
So if I am following, you can run part of your code and get a .nsys-rep file, but if you run all of it you get a crash when the .qdstrm file is being converted to .nsys-rep?
Thanks for the quick reply! Your understanding is correct, but I can execute the whole programme and successfully generate .nsys-rep file on other graphics cards, cuda versions and corresponding nsys versions. So it shouldn’t be a problem with the code.
The conversion of the .qdstrm file to .nsys-rep requires a lot of RAM. I suspect that the file that you are generating with the full code on this system is using more memory in processing than your system has available.
What is the command line you are using? Is it possible for you to get the information you need with a shorter period of analysis or fewer options?
I’m also going to loop in @skottapalli because I will be on vacation starting tomorrow and I don’t want your request to get lost.
Thank you for your reply! I’m thinking similar to you. My code needs to process data from 120 samples, when it processes only 20 samples it can generate the .nsys-rep file without any problem, when it processes more than 60 it gets an error.
In our experience you usually only need to use a few iterations to see the performance patterns. You can use the delay or duration options to limit the time of collection or the stop/start options (including by NVTX range or cuProfilerStart/Stop) to control the profiling if needed.