Ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1

ptxas fails to compile with TensorFlow XLA acceleration.
setups (tried many different combinations of listed versions below):
Platform: Windows 10
TensorFlow-gpu 2.5/2.4
CUDA 11.2.0/11.2.1/11.2.2/11.0
Python 3.8.0/3.8.10
cuDNN 8.1.1/8.1.0/7.6.0

When doing anything with TensorFlow on GPU, below message is shown, and the script runs with no further problem:

2021-07-01 22:39:34.767676: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.

When using XLA to accelerate any function, below message is shown, and the script crashes and exits:

ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1, output: ’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

I have opened an issue on TensorFlow GitHub, but it seems like the developer is less familiar with CUDA on Windows, so I have to ask here. I have already tried adding ptxas to $PATH, and filespace seems enough (with 150+G on the drive), still not working

when running code:

import tensorflow as tf

print(tf.test.is_gpu_available(True))
print(tf.test.is_built_with_cuda())
print(tf.test.is_built_with_xla())

@tf.function(jit_compile=True)
def recompiled_on_launch(a, b):
    return a + b

recompiled_on_launch(tf.ones([1, 10]), tf.ones([1, 10]))
recompiled_on_launch(tf.ones([1, 100]), tf.ones([1, 100]))

Full TensorFlow logging message and error is as below when using XLA:

2021-07-28 18:33:27.488437: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
WARNING:tensorflow:From D:/python/pyProject/HollowKnight_RL/playground.py:6: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.config.list_physical_devices('GPU') instead.
2021-07-28 18:33:29.922521: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-28 18:33:29.925441: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-28 18:33:29.957032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:29.957197: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-28 18:33:29.981022: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-28 18:33:29.981121: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-28 18:33:29.996191: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-28 18:33:30.000271: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-28 18:33:30.029714: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-28 18:33:30.047150: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-28 18:33:30.047951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-28 18:33:30.048090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
True
True
True
2021-07-28 18:33:30.523138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-28 18:33:30.523230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-28 18:33:30.523281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-28 18:33:30.523476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 3983 MB memory) → physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-07-28 18:33:30.524717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:30.524900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-28 18:33:30.525156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2021-07-28 18:33:30.525353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-28 18:33:30.525441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-28 18:33:30.525524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-28 18:33:30.525579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-28 18:33:30.525680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3983 MB memory) → physical GPU (device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-07-28 18:33:30.683822: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1735c44efc0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-28 18:33:30.683938: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): NVIDIA GeForce GTX 1660 Ti, Compute Capability 7.5
2021-07-28 18:33:30.746313: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:472] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code -1, output: ’ If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

Process finished with exit code -1073740791 (0xC0000409)

I had the same issue and luckily I just fixed it!(although it may be a little late for you …)
The environment in which my program runs is Win11 + TensorFlow-gpu 2.7.0 + CUDA 11.5 + cuDNN 8.3 + Python 3.9.6.
This warning message is caused by TensorFlow trying to use the PTXAS compiler for just-in-time compilation (JIT) at runtime, but the PTXAS compiler is not found.
By installing cuda-nvcc, you can add the PTXAS compiler to your system path. This way, when TensorFlow needs to do just-in-time compilation, it can find and call the PTXAS compiler. So, installing cuda-nvcc can fix the problem you are experiencing.
I installed cuda-nvcc with the conda install -c nvidia cuda-nvcc directive in my conda virtual environment and the warning is gone!
I hope my experience can help you.

3 Likes

I had a similar ptx compilation difference issue on CentOS7.9 and Rocky Linux8.8.

on CentOS7.9:
2024-01-05 17:16:59.197461: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8700
2024-01-05 17:16:59.487038: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn’t get ptxas version string: INTERNAL: Running ptxas --version returned 32512
*2024-01-05 17:16:59.594334: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code 32512, output: *
Relying on driver to perform ptx compilation.

On Rocky Linux8.8, I did not get the ptx compilation error, but failed in so loading:

2024-01-04 12:30:05.271731: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8700
Could not load library libcudnn_cnn_infer.so.8. Error: libnvrtc.so: cannot open shared object file: No such file or directory

How can we get the same ptx compilation error on Rocky Linux8.8 so that we can rely on GPU driver to perform ptx compilation as our application works fine on CentOS7.9? Basically, I’m looking for ways to disable the ptx compilation on Rocky Linux8.8 so that the application has same behavior to CentOS7.9 with given libraries installation and configuration.