cudaErrorIllegalAddress Encountered: "CUDA error: an illegal memory access was encountered"

Hello,

I’ve run into “CUDA error: an illegal memory access was encountered” multiple times regardless of using pytorch/ollama. I’ve tried to cut down my code to the simplest form that still produces this error. Here’s my code:

import torch

param = torch.randn((150000, 1000), dtype=torch.bfloat16, device='cuda:0')
param = param.to(torch.float16)
param = param.cpu()

print("success")

And the error is the following:

(torch) root@Htzr:~/code# python ./repro_error_2.py
Traceback (most recent call last):
  File "/root/code/./repro_error_2.py", line 5, in <module>
    param = param.cpu()
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

When I run more complicated code in pytorch or deploy models with ollama, I run into this same error more or less. My cuda driver infos are the following:

(torch) root@Htzr:~/code# conda list
# packages in environment at /root/miniconda3/envs/torch:
#
# Name                      Version          Build               Channel
_libgcc_mutex               0.1              main
_openmp_mutex               5.1              1_gnu
accelerate                  1.12.0           pypi_0              pypi
bitsandbytes                0.49.0           pypi_0              pypi
bzip2                       1.0.8            h5eee18b_6
ca-certificates             2025.12.2        h06a4308_0
certifi                     2025.11.12       pypi_0              pypi
charset-normalizer          3.4.4            pypi_0              pypi
filelock                    3.20.0           pypi_0              pypi
fsspec                      2025.12.0        pypi_0              pypi
hf-xet                      1.2.0            pypi_0              pypi
huggingface-hub             0.36.0           pypi_0              pypi
idna                        3.11             pypi_0              pypi
jinja2                      3.1.6            pypi_0              pypi
ld_impl_linux-64            2.44             h153f514_2
libexpat                    2.7.3            h7354ed3_4
libffi                      3.4.4            h6a678d5_1
libgcc                      15.2.0           h69a1729_7
libgcc-ng                   15.2.0           h166f726_7
libgomp                     15.2.0           h4751f2c_7
libmpdec                    4.0.0            h5eee18b_0
libstdcxx                   15.2.0           h39759b7_7
libstdcxx-ng                15.2.0           hc03a8fd_7
libuuid                     1.41.5           h5eee18b_0
libxcb                      1.17.0           h9b100fa_0
libzlib                     1.3.1            hb25bd0a_0
markupsafe                  3.0.2            pypi_0              pypi
modelscope                  1.33.0           pypi_0              pypi
mpmath                      1.3.0            pypi_0              pypi
ncurses                     6.5              h7934f7d_0
networkx                    3.6.1            pypi_0              pypi
numpy                       2.3.5            pypi_0              pypi
nvidia-cublas-cu12          12.6.4.1         pypi_0              pypi
nvidia-cuda-cupti-cu12      12.6.80          pypi_0              pypi
nvidia-cuda-nvrtc-cu12      12.6.77          pypi_0              pypi
nvidia-cuda-runtime-cu12    12.6.77          pypi_0              pypi
nvidia-cudnn-cu12           9.10.2.21        pypi_0              pypi
nvidia-cufft-cu12           11.3.0.4         pypi_0              pypi
nvidia-cufile-cu12          1.11.1.6         pypi_0              pypi
nvidia-curand-cu12          10.3.7.77        pypi_0              pypi
nvidia-cusolver-cu12        11.7.1.2         pypi_0              pypi
nvidia-cusparse-cu12        12.5.4.2         pypi_0              pypi
nvidia-cusparselt-cu12      0.7.1            pypi_0              pypi
nvidia-nccl-cu12            2.27.5           pypi_0              pypi
nvidia-nvjitlink-cu12       12.6.85          pypi_0              pypi
nvidia-nvshmem-cu12         3.3.20           pypi_0              pypi
nvidia-nvtx-cu12            12.6.77          pypi_0              pypi
openssl                     3.0.18           hd6dcaed_0
packaging                   25.0             pypi_0              pypi
pillow                      12.0.0           pypi_0              pypi
pip                         25.3             pyhc872135_0
psutil                      7.2.1            pypi_0              pypi
pthread-stubs               0.3              h0ce48e5_1
python                      3.13.11          hcf712cf_100_cp313
python_abi                  3.13             3_cp313
pyyaml                      6.0.3            pypi_0              pypi
readline                    8.3              hc2a1206_0
regex                       2025.11.3        pypi_0              pypi
requests                    2.32.5           pypi_0              pypi
safetensors                 0.7.0            pypi_0              pypi
setuptools                  80.9.0           py313h06a4308_0
sqlite                      3.51.0           h2a70700_0
sympy                       1.14.0           pypi_0              pypi
tk                          8.6.15           h54e0aa7_0
tokenizers                  0.22.1           pypi_0              pypi
torch                       2.9.1+cu126      pypi_0              pypi
torchvision                 0.24.1+cu126     pypi_0              pypi
tqdm                        4.67.1           pypi_0              pypi
transformers                4.57.3           pypi_0              pypi
triton                      3.5.1            pypi_0              pypi
typing-extensions           4.15.0           pypi_0              pypi
tzdata                      2025b            h04d1e81_0
urllib3                     2.6.2            pypi_0              pypi
wheel                       0.45.1           py313h06a4308_0
xorg-libx11                 1.8.12           h9b100fa_1
xorg-libxau                 1.0.12           h9b100fa_0
xorg-libxdmcp               1.1.5            h9b100fa_0
xorg-xorgproto              2024.1           h5eee18b_1
xz                          5.6.4            h5eee18b_1
zlib                        1.3.1            hb25bd0a_0

Any idea? Thanks

Problem has been solved. I contacted customer support and changed another 4090.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.