Callstack not resolved correctly

Using Nsight Systems 2023.2.1, profiling an x64 application. Most entries of nearly all callstacks are left unresolved. PDBs are all present and correct.

Unresolved callstack sample (as seen in the “Description” of the “Events View”), some entries left out for brevity:

MyApp.exe!0x7ff65b164773
MyApp.exe!0x7ff65b0b8f1f
MyApp.exe!0x7ff65b093933
MyApp.exe!0x7ff65b5c8998
MyApp.exe!0x7ff65b5c8a9d
MyApp.exe!0x7ff65b5c8873
MyApp.exe!0x7ff65b189df3
MyApp.exe!0x7ff65b5b8e23
MyApp.exe!0x7ff65b5b82e0
MyApp.exe!0x7ff65b5b6d3e
ucrtbase.dll!0x7ff9b6531bb2

Resolved callstack (as seen in the “Description” of the “Events View”) after resolving:

MyApp.exe!0x7ff65b164773
MyApp.exe!0x7ff65b0b8f1f
MyApp.exe!0x7ff65b093933
MyApp.exe!0x7ff65b5c8998
MyApp.exe!0x7ff65b5c8a9d
MyApp.exe!0x7ff65b5c8873
MyApp.exe!ImagePreparationAndBufferProcessor::process
MyApp.exe!RawFrameBufferProcessor::auxProcessWorker
MyApp.exe!`DataProcessorEnhanced<...>::startThread'::`2'::<...>::operator()
MyApp.exe!std::thread::_Invoke<...>
ucrtbase.dll!0x7ff9b6531bb2

Equivalent callstack as seen in Visual Studio:

MyApp.exe!`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1>::operator()
MyApp.exe!tbb::detail::d1::dynamic_grainsize_mode<tbb::detail::d1::adaptive_mode<tbb::detail::d1::auto_partition_type> >::work_balance<tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>,tbb::detail::d1::parallel_for_body_wrapper<`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1>,int>,tbb::detail::d1::auto_partitioner const >,tbb::detail::d1::blocked_range<int> >
MyApp.exe!tbb::detail::d1::start_for<tbb::detail::d1::blocked_range<int>,tbb::detail::d1::parallel_for_body_wrapper<`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1>,int>,tbb::detail::d1::auto_partitioner const >::execute+0xb3   |  (00007ff6`5b093970)   MyApp!DataProcessor<std::shared_ptr<std::vector<std::shared_ptr<PointCloudImageData>,std::allocator<std::shared_ptr<PointCloudImageData> > > >,std::shared_ptr<std::vector<std::shared_ptr<PointCloudImageData>,std::allocator<std::shared_ptr<PointCloudImageData> > > > >::DataProcessor<std::shared_ptr<std::vector<std::shared_ptr<PointCloudImageData>,std::allocator<std::shared_ptr<PointCloudImageData> > > >,std::shared_ptr<std::vector<std::shared_ptr<PointCloudImageData>,std::allocator<std::shared_ptr<PointCloudImageData> > > > >
MyApp.exe!tbb::detail::d1::parallel_for<tbb::detail::d1::blocked_range<int>,tbb::detail::d1::parallel_for_body_wrapper<`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1>,int> >
MyApp.exe!tbb::detail::d1::parallel_for_impl<int,`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1>,tbb::detail::d1::auto_partitioner const >
MyApp.exe!tbb::detail::d1::parallel_for<int,`ImagePreparationAndBufferProcessor::process'::`6'::<lambda_1> >
MyApp.exe!ImagePreparationAndBufferProcessor::process
MyApp.exe!RawFrameBufferProcessor::auxProcessWorker
MyApp.exe!`DataProcessorEnhanced<std::shared_ptr<std::vector<std::shared_ptr<RawImageData>,std::allocator<std::shared_ptr<RawImageData> > > >,std::shared_ptr<std::vector<std::shared_ptr<RawImageData>,std::allocator<std::shared_ptr<RawImageData> > > > >::startThread'::`2'::<lambda_1>::operator()
MyApp.exe!std::thread::_Invoke<std::tuple<`DataProcessorEnhanced<std::shared_ptr<std::vector<std::shared_ptr<RawImageData>,std::allocator<std::shared_ptr<RawImageData> > > >,std::shared_ptr<std::vector<std::shared_ptr<RawImageData>,std::allocator<std::shared_ptr<RawImageData> > > > >::startThread'::`2'::<lambda_1> >,0>
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>

The first two resolved callstack entries are non-templated functions and are ok.

The other functions are lambdas and only “work” in some cases. When it “works”, the resolver seems to be doing some kind of simplification (when compared to the “actual” callstack copied from the Visual Studio IDE, see resolved callstack entries 3 and 4). My suspicion is that this simplification seems to fail in more “elaborate” cases.

This is reproducable in nsys-ui.exe as well as with the ResolveSymbols.exe tool.

Although I can resolve the addresses manually somehow, it is time-consuming and annoying not to be able to see a meaningful callstack in the Nsight UI. If there is a maximum length that is displayed and/or this suspected simplification fails, at least something should be displayed (e.g. by just cutting out something from the middle and replacing it with “…” or similar).

Thanks!
Best, Christoph.