Pytorch not able to use cuda Error: cudaErrorUnknown: unknown error

I’m having a problem using CUDA with python3 pytorch. I’m using a tx2 with Jetpack 3.3 and Python 3.5. The CUDA version is 9.0.

When I use pytorch,

sudo python3
>>> import torch
>>> print(torch.__version__)
1.1.0a0+fa0bfa0
>>> print('CUDA available: ' + str(torch.cuda.is_available()))
CUDA available: True
>>> a = torch.cuda.FloatTensor(2).zero_()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: unknown error

I followed the following link to install the pytorch and got no errors.
https://gist.github.com/dusty-nv/ef2b372301c00c0a9d3203e42fd83426

Can anyone help me with it? Thanks!

Hi,

Could you try if this helps?

sudo rm -rf ~/.nv
sudo reboot

Thanks.

Hi AastaLLL,

I tried but it doesn’t work.

I also realized the similar unknown error appeared randomly with Cupy. The Cupy worked fine after I rebooted the system. However, once I failed with torch, the Cupy also failed with the same error.

$ sudo python3
[sudo] password for nvidia: 
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.cuda.FloatTensor(2).zero_()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: unknown error
>>> import cupy
>>> x_gpu = cupy.array([1, 2, 3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/cupy/creation/from_data.py", line 41, in array
    return core.array(obj, dtype, copy, order, subok, ndmin)
  File "cupy/core/core.pyx", line 2350, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2384, in cupy.core.core.array
  File "cupy/core/core.pyx", line 151, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 517, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1065, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1086, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 900, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 921, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 680, in cupy.cuda.memory._try_malloc
  File "cupy/cuda/memory.pyx", line 677, in cupy.cuda.memory._try_malloc
  File "cupy/cuda/memory.pyx", line 869, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
  File "cupy/cuda/memory.pyx", line 472, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 473, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 77, in cupy.cuda.memory.Memory.__init__
  File "cupy/cuda/runtime.pyx", line 213, in cupy.cuda.runtime.malloc
  File "cupy/cuda/runtime.pyx", line 136, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorUnknown: unknown error

I’ve met the same problem. Any suggestions? Thanks.

Hello,i have the same problem even if i run the step as below:

sudo rm -rf ~/.nv
sudo reboot

Any solution?
Thanks

Hi,

The most common issue is incompatible CUDA driver/library.

We have built a pyTorch for JetPack4.2 with Xavier/TX2/Nano support recently.
Would you mind to reflash the device and give the package a try?
https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/

Thanks.

I’m running TX2 on Jetpack 3.3 with CUDA 9.0.

On building from source I haev the same issue as mentioned by #1

The whl file mentioned above requires cuda10 as I get the following error during Import: ‘ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory’

I made a whl file on the TX2 using steps mentioned in #6. (I had to create a swap file of 8GB and use apt-get for ninja instead of pip). I still run into the same issue as #1. https://ufile.io/bkjam is my whl file.

Hi,

For JetPack3.3, you will need to build it from source on your own.
Would you mind to follow our instruction and do it again?
https://devtalk.nvidia.com/default/topic/1041716/jetson-agx-xavier/pytorch-install-problem-solved-/post/5284747/#5284747

Thanks.