(Continuing from last post)
I visited the compute side and the WAR(0d56577) you submitted. I think it’s exactly the same issue. Prior to that change, the EndPass doesn’t include the ExecuteCommandList(ECL), which would lead to no data being collected(please refer to my last post):
PROFILE_COMPUTE_END_PASS(); // <---------------------------------------------
PIXEndEvent(m_CommandList.Get()); // SDF Bake
{
// Execute command list
THROW_IF_FAIL(m_CommandList->Close());
ID3D12CommandList* ppCommandLists[] = { m_CommandList.Get() };
m_PreviousWorkFence = computeQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
computeQueue->WaitForFenceCPUBlocking(m_PreviousWorkFence);
}
I tired to remove your WAR and replace it with a single call to EndPass(), and it worked just fine:
PROFILE_COMPUTE_BEGIN_PASS("SDF Bake", nullptr);
BuildCommandList_Setup(pipelineSet, object, m_Resources);
BuildCommandList_HierarchicalBrickBuilding(pipelineSet, object, m_Resources, maxIterations);
{
// Execute work and wait for it to complete
THROW_IF_FAIL(m_CommandList->Close());
ID3D12CommandList* ppCommandLists[] = { m_CommandList.Get() };
const auto fenceValue = computeQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
// CPU wait until this work has been complete before continuing
computeQueue->WaitForFenceCPUBlocking(fenceValue);
..
}
{
// Read counter value
...
}
BuildCommandList_BrickEvaluation(pipelineSet, object, m_Resources);
PIXEndEvent(m_CommandList.Get()); // SDF Bake
{
// Execute command list
THROW_IF_FAIL(m_CommandList->Close());
ID3D12CommandList* ppCommandLists[] = { m_CommandList.Get() };
m_PreviousWorkFence = computeQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
computeQueue->WaitForFenceCPUBlocking(m_PreviousWorkFence);
}
PROFILE_COMPUTE_END_PASS(nullptr); // <--------------------------- Single line to EndPass
Collected results:
drops,0.5000,2837,258,SDF Bake, 818694, 32, 23, 45, 28, 28, 94, 2, 76, 26, 12, 28, 0, 13, 10, 13, 20
drops,0.5000,2837,258,SDF Bake/Edit Dependencies, 4667, 22, 32, 46, 21, 26, 96, 12, 100, 13, 5, 21, 0, 0, 18, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building, 289176, 7, 55, 38, 18, 18, 97, 2, 99, 11, 6, 18, 0, 0, 75, 0, 3
drops,0.5000,2837,258,SDF Bake/AABB Building, 3778, 1, 12, 87, 1, 3, 77, 4, 100, 0, 0, 1, 0, 0, 48, 0, 21
drops,0.5000,2837,258,SDF Bake/Brick Evaluation, 218287, 94, 4, 1, 82, 80, 91, 2, 99, 82, 38, 80, 0, 48, 47, 48, 48
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Brick Counting1, 114676, 7, 88, 6, 16, 16, 96, 1, 100, 12, 6, 16, 0, 0, 47, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Prefix Sum1, 4704, 0, 1, 99, 0, 0, 14, 1, 100, 0, 0, 0, 0, 0, 65, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Brick Building1, 6722, 5, 63, 32, 10, 6, 77, 1, 100, 9, 10, 6, 0, 0, 84, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Edit Culling1, 14907, 7, 66, 27, 20, 21, 97, 6, 99, 11, 5, 20, 0, 0, 76, 0, 1
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Brick Counting2, 52120, 4, 37, 59, 10, 10, 94, 1, 100, 7, 3, 10, 0, 0, 49, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Prefix Sum2, 8444, 0, 3, 97, 0, 0, 29, 0, 100, 0, 0, 0, 0, 0, 72, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Brick Building2, 7352, 16, 57, 27, 33, 22, 82, 2, 100, 28, 33, 22, 0, 0, 83, 0, 0
drops,0.5000,2837,258,SDF Bake/Hierarchical Brick Building/Edit Culling2, 31528, 28, 62, 10, 71, 71, 97, 7, 100, 32, 17, 71, 0, 0, 83, 0, 1
In your use case, I feel it’s more natural to always pass nullptrs to BeginPass/EndPass which underneath uses the queue-level push/pop range. Because otherwise, it’s missing a ECL:
void NvGPUProfiler::EndPassImpl(ID3D12GraphicsCommandList* commandList)
{
if (!m_Profiler.AllPassesSubmitted() && m_Profiler.IsInPass())
{
if (commandList)
PopRangeImpl(commandList); // <-------- Missing ECL(commandList) between this line and EndPass(), and it doesn't make much sense to execute it here either.
else
PopRangeImpl();
THROW_IF_FALSE(m_Profiler.EndPass(), "Failed to end a pass.");
m_DataReady = true;
}
}
(What queue-levle PushRange/PopRange does it, the NvPerfSDK DLL will internally create a command list, adding the push/pop range instrumentation, and ECL it for you)
Should you have any other questions, please let me know!
Thanks,
Yiran