If the pytorch is compiled to use CUDA 11.6 or CUDA 11.7, I doubt it is using CUDA 11.8. That typically doesn’t work. The pythonic pytorch installs that I am familiar with on linux bring their own CUDA libraries for this reason. I can’t tell how it was installed here.
Those CUDA 11.6/11.7 CUFFT libraries may not work correctly with 4090. That was the reason for my comment. NVIDIA recommends CUDA 11.8 minimum for use with RTX 40 series GPUs, and its often the case that it takes a while for DL framework “providers” to catch up with these needs and provide a new version that is linked against CUDA 11.8 (in this case) and provides CUDA 11.8 libraries. Or you can build your own pytorch.
Using a pytorch set up in an NGC container may be another option. The WSL documentation explains how to launch NGC containers.
To test the theory of a basic CUDA 11.8 CUFFT bug in WSL, I would run a test like what was already suggested - a pure CUFFT code linked against CUDA 11.8.