D3D12 driver is crashing when a compute shader is executed with a power of 2 numthreads

Ok so this is perhaps one of the craziest problems I have ever encountered. I have reduced this problem to a simple compute shader that does nothing but write a UAV:

RWStructuredBuffer<uint> g_pathVisibility : register(u0, space1);

cbuffer cbPushConstants : register(b0)
{
	uint g_count;
};

[numthreads(32, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID)
{
	if(DTid.x < g_count)
	{
		g_pathVisibility[DTid.x] = DTid.x + 1;
	}
}

g_pathVisibility is a buffer of 128 int’s and g_count is set to the value of 128 as a root constant. This shader is executed on a compute list/queue. With any numthreads value that is a power of 2 nvlddmkm crashes and I get a TDR device reset with the following error:

D3D12 ERROR: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DEVICE_HUNG: The Device took an unreasonable amount of time to execute its commands, or the hardware crashed/hung. As a result, the TDR (Timeout Detection and Recovery) mechanism has been triggered. The current Device Context was executing commands when the hang occurred. The application may want to respawn and fallback to less aggressive use of the display hardware). [ EXECUTION ERROR #232: DEVICE_REMOVAL_PROCESS_AT_FAULT]

If I set numthread to [numthreads(31, 1, 1)] or [numthreads(33, 1, 1)] or any other non-power of 2 value the shader will run fine. If I set it to [numthreads(32, 1, 1)], [numthreads(64, 1, 1)], [numthreads(128, 1, 1)], etc then the driver will crash. What’s even crazier is that I can run the Microsoft n-Body Gravity sample, which uses a power of 2 numthreads, without any problems whatsoever! I’ve compared my code with theirs and I can’t see where there’s any potential problem. The D3D12 validation layer, which I have turned on max verbosity for all categories, says everything is all good. I get the following errors in the windows event viewer:

Display driver nvlddmkm stopped responding and has successfully recovered.
\Device\Video3 
   Graphics Exception: ESR 0x504224=0x80000000 0x504228=0x0 0x50422c=0x0 0x504234=0x0 
   0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000
\Device\Video3 
   Graphics Exception: ESR 0x504224=0x80000041 0x504228=0x180002 0x50422c=0xf4900 0x504234=0x1fc0 
   0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000
\Device\Video3 
   NVRM: Graphics TEX Exception on (GPC 0, TPC 0): TEX NACK / Page Fault 
   0000000002003000000000000D00AAC0000000000000000000000000000000000000000000000000

I’m running the latest Windows 10 Creators Update with a GTX 1080 and Visual Studio 2017. I’ve reproduced this problem in both UWP and regular windows applications. The shader is compiled as cs_5_1. Sample code and system info is attached.
PathTransform.zip (61.2 KB)

NVIDIA System Information 06-13-2017 01-51-23.txt (3.67 KB)