Dynamic indexing of descriptors in cs_5_1 causes debug validation layer error

In a compute shader in Direct X 12 (cs_5_1 profile) I am using dynamic indexing of RWStructuredBuffer types. When accessing the resource in the shader using the dynamic index as shown in the code below:

/**
 * Inputs that are passed to a compute shader.
 */
struct ComputeShaderInput
{
    uint3 GroupID           : SV_GroupID;           // 3D index of the thread group in the dispatch.
    uint3 GroupThreadID     : SV_GroupThreadID;     // 3D index of local thread ID in a thread group.
    uint3 DispatchThreadID  : SV_DispatchThreadID;  // 3D index of global thread ID in the dispatch.
    uint  GroupIndex        : SV_GroupIndex;        // Flattened local index of the thread within a thread group.
};

RWTexture2D<float4> DebugTexture : register( u0 );

// Global counter for current index into the light index list.
RWStructuredBuffer<uint> PointLightIndexCounter[2] : register( u1 );

#define TestShader_RootSignature \
    "RootFlags(0)," \
    "DescriptorTable(UAV(u0, numDescriptors=3))"

[RootSignature( TestShader_RootSignature )]
[numthreads( 16, 16, 1 )]
void main( ComputeShaderInput IN )
{
    int2 texCoord = IN.DispatchThreadID.xy;

    uint pointLightIndexCounter0 = PointLightIndexCounter[0][0];
    uint pointLightIndexCounter1 = PointLightIndexCounter[1][0];
    
    DebugTexture[texCoord] = float4( pointLightIndexCounter0, pointLightIndexCounter1, 0, 1 );
}

Then the Debug validation layer generates an error:

D3D12: Removing Device.
D3D12 ERROR: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #232: DEVICE_REMOVAL_PROCESS_AT_FAULT]

If I change the shader to not use dynamic index of the resources:

/**
 * Inputs that are passed to a compute shader.
 */
struct ComputeShaderInput
{
    uint3 GroupID           : SV_GroupID;           // 3D index of the thread group in the dispatch.
    uint3 GroupThreadID     : SV_GroupThreadID;     // 3D index of local thread ID in a thread group.
    uint3 DispatchThreadID  : SV_DispatchThreadID;  // 3D index of global thread ID in the dispatch.
    uint  GroupIndex        : SV_GroupIndex;        // Flattened local index of the thread within a thread group.
};

RWTexture2D<float4> DebugTexture : register( u0 );

// Global counter for current index into the light index list.
RWStructuredBuffer<uint> PointLightIndexCounter0 : register( u1 );
RWStructuredBuffer<uint> PointLightIndexCounter1 : register( u2 );

#define TestShader_RootSignature \
    "RootFlags(0)," \
    "DescriptorTable(UAV(u0, numDescriptors=3))"

[RootSignature( TestShader_RootSignature )]
[numthreads( 16, 16, 1 )]
void main( ComputeShaderInput IN )
{
    int2 texCoord = IN.DispatchThreadID.xy;

    uint pointLightIndexCounter0 = PointLightIndexCounter0[0];
    uint pointLightIndexCounter1 = PointLightIndexCounter1[0];

    DebugTexture[texCoord] = float4( pointLightIndexCounter0, pointLightIndexCounter1, 0, 1 );
}

(Note the changes on lines 15, 16 and 28, 29) Then everything works fine.

I’ve also run the D3D12DynamicIndexing sample in the Microsfot DirectX Samples on GitHub (https://github.com/Microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12DynamicIndexing/src) which seems to work fine which leads me to believe that this is only an issue for the cs_5_1 profile.

I am running Windows 10 (build 14332) with NVIDIA GeForce GTX Titan X and driver version 365.19.

I have updated to the latest drivers from NVIDIA (368.22) but the issue persists.

Has anyone experienced any issues with Dynamic descriptor indexing cs_5_1?

I haven’t seen any samples or code that has dynamic indexing of UAVs. My own attempts have resulted in failure. Interestingly, I have been able to index UAV descriptor arrays statically, using a literal value (like you’ve done), but using a constant buffer value makes the writes fail. Funny though, i can branch on that constant buffer value and the writes go through, as long as the index is a literal. for example:

RWTexture2D<float> myUavs[32] : register( u0 );

cbuffer constants : register( b0 )
{
	uint uavIndex; // set to 16 from code
};

[numthreads( 8, 8, 1 )]
void cs_main( uint2 id : SV_DispatchThreadID )
{

	myUavs[uavIndex][id] = 1; // doesn't ever work.

	if ( uavIndex == 16 )
	{
		myUavs[16][id] = 1; // works....
	}
}

Investigating further

myUavs[0][id] = 1;

doesn’t work if i set the fxc flag /Od to disable optimization because it generates the code:

mov r0.x, l(0)
itof r0.y, l(1)
store_uav_typed U0[r0.x + 0].xyzw, vThreadID.xyyy, r0.yyyy

but leaving optimization at the default level optimizes away the register:

store_uav_typed U0[0].xyzw, vThreadID.xyyy, l(1.000000,1.000000,1.000000,1.000000)

which does work.

I’m thinking this is a driver bug. Where does one go to report such a thing?
I would like to test this on an AMD card, but I don’t have one.

UPDATE:
I’ve got the code running on my integrated GPU (haswell) and the UAV writes succeed. Definitely looking look an nvidia driver bug here.

I get the same result as Pyromuffin. I am using a GeForce 980M GTX with driver version 368.81 on 64-bit Windows version 10.0.10586 with Direct3D 12 compiling a shader in profile cs_5_1. I don’t get the device removal like jpvanoosten, but my shader fails to write to the UAV unless the shader compiler emits code that uses an absolute index. Dynamic indexing works for CBVs. I have reviewed Microsoft’s documentation of hardware tiers, but have not been able to find anything that indicates it shouldn’t work for UAVs.

Here is another shader that reproduces the problem:

RWTexture2D<float4> output[1] : register(u0);

[numthreads(16, 16, 1)]
void ComputeShaderMain(uint3 dispatchThreadId : SV_DispatchThreadID)
{
	output[NonUniformResourceIndex(dispatchThreadId.z)][dispatchThreadId.xy] = 1.0;
}

It does not write to the bound UAV.

Reading from a uav in an array also fails when using a dynamic index.

Hello,

I think your answer might be here: https://msdn.microsoft.com/en-us/library/windows/desktop/dn899207(v=vs.85).aspx

See section on Divergence and derivatives, second point:

“Resource indexes that may be divergent must be marked with the NonUniformResourceIndex function in HLSL code. Otherwise results are undefined.”

You might also want to read this: http://asawicki.info/news_1608_direct3d_12_-_watch_out_for_non-uniform_resource_index.html

Good luck!

I am using NonUniformResourceIndex. Pyromuffin’s and jpvanoosten’s shaders don’t appear to require it.