Out of memory when creating a simple compute pipeline state object (DX12)

In one sentence, if one writes circular shift like (a << b | a >> (64 - b)) where a is an 64-bit integer in compute shader, then they will get E_OUTOFMEMORY when creating compute pipeline state.

I attached a minimal example here. (the term “minimal” does not apply to DX12 boilerplates)
snippet.zip (21.3 KB)

To compile it, just decompress the archive, open MSVC x64 Native Tools Command Prompt, cd into the directory containing the example, and run make.cmd (optionally with a number 1/0 indicating if you want to create an NVIDIA device or not). Then run main.exe, and an OOM error should show up for NVIDIA GPU.

The example provided works fine with the integrated GPU on my laptop, while OOM was returned when using my discrete GeForce GTX 1050Ti. And I confirmed with the stack trace that the error was generated in nvwgf2umx.dll, an NVIDIA driver module.

I guess there is something wrong in the driver who compiles a shader from DXBC to some GPU native format, especially for recognizing and optimizing the circular shift pattern.

Hopefully we can get circular shift for uint64_t work.
Any help would be welcome.