PyTorch Can't transfer anything to GPU

Environment

Windows build number: Version 2004 (Build 20150.1000)
Distribution version: Ubuntu 20.04
With WSL2
Linux Kernel: 4.19.121-microsoft-standard
Python: 3.5.4
PyTorch: 1.2
Geforce Driver: 455.41

Steps to reproduce

import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tensor = torch.zeros(1, 1, 10)
tensor.to(device)

torch.cuda.is_available() returns true.

Error message:
CUDA error: unknown error

1 Like

Thanks for reaching out.
Could you run dxdiag on the Windows host system and attach the results here (There is a button to export it as a text file) ?

Thanks !

Here is the DxDiag :)

DxDiag.txt (100.1 KB)

Thank you,

From your DxDiag nothing looks wrong on the host OS.

Could you verify a couple of things:

  • No Native Linux driver got accidentally installed on your distro (this could happen if you installed the CUDA Toolkit via the wrong deb file or if it was pulled as a dependency somehow)
  • You can run some basic CUDA samples (if you do install a toolkit for the samples make sure to install it via the runfile and not the deb file and remove the Native Display Driver from the list of installable components)

If you do have a display driver that got accidentally installed you can purge the packages and it should work again.

Thanks,

How can I check if a native Linux driver got installed? (I’m pretty sure I didn’t install one)

CUDA Samples of CUDA 11 Toolkit (Note that my PyTorch Version uses CUDA 10):
1_Utilities:
deviceQuery passed,
UnifiedMemoryPerf:

Printing Average of 20 measurements in (ms)
Size_KB  UMhint UMhntAs  UMeasy   0Copy MemCopy CpAsync CpHpglk CpPglAs
4         1.000 198.365   0.511   0.474   2.494   1.040   1.717   0.583
16        1.147 199.833   0.430   0.500   2.337   1.526   1.956   0.691
64        1.433 198.690   0.604   0.667   2.464   1.371   1.786   0.615
256       1.441 198.518   1.058   0.884   3.118   2.114   2.121   0.915
1024      4.032 201.392   3.260   3.449   5.500   3.990   4.317   1.826
4096     23.034 198.562  22.467  22.360  14.293  13.620   8.145   6.446
16384   189.794 298.311 192.925 181.818  63.798  64.018  33.766  31.546

topologyQuery result:

GPU0 <-> CPU:
  * Atomic Supported: no
1 Like

When I update my torch and pytorch version with
pip3.5 install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
the code runs without an error but doesn’t work either.

import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tensor = torch.rand(1, 1, 10)
tensor.to(device)

tensor_two = tensor + tensor

runs and torch.cuda.is_available() still returns true. But if I inspect tensor_two at the end the device attribute of it is device(type="cpu").

You should use
tensor = torch.rand(1, 1, 10).to(device)
because .to() does not modify variable.
https://pytorch.org/docs/stable/tensors.html#torch.Tensor.to

Thanks @vvodan

This is now working with

PyTorch: 1.5.1 (CUDA 10.1)
Python: 3.5.9

instead of my previous setup

PyTorch: 1.2 (CUDA 10)
Python: 3.5.4

I would still consider it as a problem though because I would like to work with PyTorch 1.2 since it is the version that we use in our workplace.