Ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1

ptxas fails to compile with TensorFlow XLA acceleration.
setups (tried many different combinations of listed versions below):
Platform: Windows 10
TensorFlow-gpu 2.5/2.4
CUDA 11.2.0/11.2.1/11.2.2/11.0
Python 3.8.0/3.8.10
cuDNN 8.1.1/8.1.0/7.6.0

When doing anything with TensorFlow on GPU, below message is shown, and the script runs with no further problem:

2021-07-01 22:39:34.767676: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.

When using XLA to accelerate any function, below message is shown, and the script crashes and exits:

ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1, output: ’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

I have opened an issue on TensorFlow GitHub, but it seems like the developer is less familiar with CUDA on Windows, so I have to ask here. I have already tried adding ptxas to $PATH, and filespace seems enough (with 150+G on the drive), still not working

when running code:

import tensorflow as tf

print(tf.test.is_gpu_available(True))
print(tf.test.is_built_with_cuda())
print(tf.test.is_built_with_xla())

@tf.function(jit_compile=True)
def recompiled_on_launch(a, b):
    return a + b

recompiled_on_launch(tf.ones([1, 10]), tf.ones([1, 10]))
recompiled_on_launch(tf.ones([1, 100]), tf.ones([1, 100]))

Full TensorFlow logging message and error is as below when using XLA:

2021-07-28 18:33:27.488437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
WARNING:tensorflow:From D:/python/pyProject/HollowKnight_RL/playground.py:6: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU') instead.
2021-07-28 18:33:29.922521: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-28 18:33:29.925441: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-28 18:33:29.957032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:29.957197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-28 18:33:29.981022: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-28 18:33:29.981121: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-28 18:33:29.996191: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-28 18:33:30.000271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-28 18:33:30.029714: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-28 18:33:30.047150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-28 18:33:30.047951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-28 18:33:30.048090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
True
True
True
2021-07-28 18:33:30.523138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-28 18:33:30.523230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-28 18:33:30.523281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-28 18:33:30.523476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 3983 MB memory) → physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-07-28 18:33:30.524717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:30.524900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-28 18:33:30.525156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:30.525353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-28 18:33:30.525441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-28 18:33:30.525524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-28 18:33:30.525579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-28 18:33:30.525680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3983 MB memory) → physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-07-28 18:33:30.683822: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1735c44efc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-28 18:33:30.683938: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): NVIDIA GeForce GTX 1660 Ti, Compute Capability 7.5
2021-07-28 18:33:30.746313: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:472] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1, output: ’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

Process finished with exit code -1073740791 (0xC0000409)