PyTorch CUDA Incompatibility on NVIDIA Thor (L4T 38.4, CUDA 13)

I am working on an NVIDIA Thor platform running L4T version 38.4.0 with CUDA 13.0, and I am encountering issues with PyTorch GPU operations. While CUDA is detected correctly (torch.cuda.is_available() returns True) and basic tensor operations execute successfully on the GPU, all GEMM-based operations such as torch.matmul and nn.Linear consistently fail with errors like CUBLAS_STATUS_INVALID_VALUE and CUBLAS_STATUS_NOT_INITIALIZED. I tested this using PyTorch (pip-installed version 2.10.0) across Python 3.10 and 3.12 environments, and the issue persists in both cases. A minimal reproducible example is creating two CUDA tensors and performing torch.matmul, which immediately triggers the cuBLAS error. Interestingly, alternative GPU frameworks such as CuPy perform matrix multiplication successfully and achieve expected high performance, indicating that the CUDA runtime and hardware are functioning correctly and the issue appears specific to PyTorch’s cuBLAS/cuBLASLt integration on this platform. Additionally, the official NVIDIA PyTorch container (nvcr.io/nvidia/pytorch:24.02-py3) reports that NVIDIA Thor is not yet supported and fails to detect a compatible GPU. Based on these observations, I would like to understand whether PyTorch is officially supported on Thor (L4T 38.x with CUDA 13), whether there are known issues with cuBLAS/cuBLASLt on this platform, if there is a recommended PyTorch build or workaround, and when official support (including containers or compatible wheels) can be expected.

These might help:

Run this in your venv to get consistent versions of about 25 python cuda related packages.
pip install nvmath-python[cu13-dx]

Your docker image PyTorch Release 24.02 is cuda-12
Ubuntu 22.04 including Python 3.10
NVIDIA CUDA 12.3.2
NVIDIA cuBLAS 12.3.4.1

Look here and pick the image tag that comports to your Thor.

https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-12.html

Hi,

Thor requires the 13.0+ CUDA library.
Could you try the container below instead:

nvcr.io/nvidia/pytorch:26.02-py3

Thanks.