PyTorch utilize CPU instead of GPU

When I try to use CUDA for training NN or just for simple calculation, PyTorch utilize CPU instead of GPU

Python 3.8.3 (default, Jun 25 2020, 23:21:14)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
>>> device
device(type='cuda', index=0)
>>> tensor = torch.rand(1, 1, 10).to(device)
>>> tensor
tensor([[[0.1126, 0.1737, 0.9678, 0.8833, 0.6923, 0.2118, 0.9874, 0.9397,
          0.4831, 0.4274]]], device='cuda:0')
>>> tensor_two = tensor + tensor
>>> tensor_two
tensor([[[0.2252, 0.3474, 1.9356, 1.7666, 1.3847, 0.4236, 1.9747, 1.8794,
          0.9661, 0.8549]]], device='cuda:0')
>>> while True:
...   tensor_two = tensor + tensor
...

nvidia-smi output under Windows:

Sun Jun 28 10:46:36 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.41       Driver Version: 455.41       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070   WDDM  | 00000000:01:00.0  On |                  N/A |
| 44%   39C    P2    33W / 151W |   1394MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A         4      C   Insufficient Permissions        N/A      |
|    0   N/A  N/A      1268    C+G   ...lPanel\SystemSettings.exe    N/A      |
|    0   N/A  N/A      1468    C+G   C:\Windows\System32\dwm.exe     N/A      |
|    0   N/A  N/A      3784    C+G   ...bbwe\Microsoft.Photos.exe    N/A      |
|    0   N/A  N/A      8844    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A      8892    C+G   ...artMenuExperienceHost.exe    N/A      |
|    0   N/A  N/A     10380    C+G   ...ropbox\Client\Dropbox.exe    N/A      |
|    0   N/A  N/A     14504    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     15984    C+G   ...8bbwe\WindowsTerminal.exe    N/A      |
|    0   N/A  N/A     19380    C+G   ...2txyewy\TextInputHost.exe    N/A      |
|    0   N/A  N/A     28380    C+G   ...b3d8bbwe\WinStore.App.exe    N/A      |
+-----------------------------------------------------------------------------+

Task manager:

WSL version:
Linux version 4.19.121-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Fri Jun 19 21:06:10 UTC 2020

Ubuntu 20.04 LTS

DxDiag.txt (114.6 KB)

The same code on Ubuntu PC:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   46C    P2    63W / 250W |    663MiB / 11178MiB |     19%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   39C    P8    11W / 250W |     12MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   43C    P8    12W / 250W |     12MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
|  0%   43C    P8    13W / 250W |     12MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                           
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       942      G   /usr/lib/xorg/Xorg                            25MiB |
|    0      1239      G   /usr/bin/gnome-shell                          57MiB |
|    0      5591      C   ...vlad/.pyenv/versions/pytorch/bin/python   567MiB |
+-----------------------------------------------------------------------------+
2 Likes

Same thing here. I use a surface book 2, linux kernel 4.19.121, ubuntu and miniconda.
While the GPU is detected by pytorch, it is not used during training.

3 Likes

Tensorflow docker container doesn’t utilize GPU too.

Train on 60000 samples
Epoch 1/10
60000/60000 [==============================] - 39s 642us/sample - loss: 0.4938 - accuracy: 0.8272
Epoch 2/10
60000/60000 [==============================] - 27s 447us/sample - loss: 0.3753 - accuracy: 0.8636
Epoch 3/10
60000/60000 [==============================] - 26s 436us/sample - loss: 0.3360 - accuracy: 0.8761
Epoch 4/10
60000/60000 [==============================] - 27s 458us/sample - loss: 0.3117 - accuracy: 0.8863
Epoch 5/10
60000/60000 [==============================] - 33s 547us/sample - loss: 0.2945 - accuracy: 0.8915
Epoch 6/10
60000/60000 [==============================] - 34s 571us/sample - loss: 0.2815 - accuracy: 0.8951
Epoch 7/10
60000/60000 [==============================] - 37s 615us/sample - loss: 0.2700 - accuracy: 0.8998
Epoch 8/10
60000/60000 [==============================] - 38s 626us/sample - loss: 0.2591 - accuracy: 0.9039
Epoch 9/10
60000/60000 [==============================] - 39s 643us/sample - loss: 0.2498 - accuracy: 0.9059
Epoch 10/10
60000/60000 [==============================] - 36s 595us/sample - loss: 0.2404 - accuracy: 0.9105

[W 18:12:36.300 NotebookApp] Notebook tensorflow-tutorials/classification.ipynb is not trusted
[I 18:12:38.682 NotebookApp] Kernel started: 9eb86799-5559-4d34-ba36-7edd178525c4
2020-07-09 18:12:53.102083: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-07-09 18:12:53.103647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-07-09 18:13:51.119733: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-09 18:13:51.220018: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.220439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2020-07-09 18:13:51.220576: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-09 18:13:51.220709: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-09 18:13:51.222465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-09 18:13:51.223247: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-09 18:13:51.225612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-09 18:13:51.228846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-09 18:13:51.229013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-09 18:13:51.230293: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.231493: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.231930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-07-09 18:13:51.240228: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3410010000 Hz
2020-07-09 18:13:51.241999: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56b3460 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-09 18:13:51.242058: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-09 18:13:51.689421: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.689939: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56465a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-09 18:13:51.689973: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1
2020-07-09 18:13:51.691056: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.691691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2020-07-09 18:13:51.691838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-09 18:13:51.691882: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-07-09 18:13:51.691952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-07-09 18:13:51.691996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-07-09 18:13:51.692096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-07-09 18:13:51.692196: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-07-09 18:13:51.692287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-07-09 18:13:51.695324: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.697101: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:51.697587: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-07-09 18:13:51.697738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-07-09 18:13:52.656161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-09 18:13:52.656231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-07-09 18:13:52.656282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-07-09 18:13:52.657540: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:52.657991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2020-07-09 18:13:52.659207: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2020-07-09 18:13:52.660162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6835 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-07-09 18:14:10.154011: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 6.67G (7167590400 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error
2020-07-09 18:14:25.532223: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 6.01G (6450831360 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error
[I 18:14:38.780 NotebookApp] Saving file at /tensorflow-tutorials/classification.ipynb
2020-07-09 18:14:39.436606: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 5.41G (5805748224 bytes) from device: CUDA_ERROR_UNKNOWN: unknown error
2020-07-09 18:15:30.640365: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
1 Like

Tensorflow docker containers should work. Do you mind to describe the exact container you tried to run (the actual docker command line) and how did you determin it wasn’t utilizing the GPU?

I ran commands from wsl user guide.
Docker command line:
docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

CPU/GPU utilization was monitored with task manager and nvidia-smi under Windows.

2 Likes

I just ran the above docker command and checked my GPU usage when running the samples, it stayed at 0% throughout:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.12       Driver Version: 465.12       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3080   WDDM  | 00000000:40:00.0 Off |                  N/A |
|  0%   60C    P2   111W / 340W |   9764MiB / 10240MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I’m hoping to be able to use Pytorch with Cuda on Windows in the future.

Offtopic: When I run the python commands @vvodan posted in the first post I get the same results, pytorch can see the GPU and enables CUDA without actually using it even though I’m not using a docker to run python in. (Just Anaconda) Does this mean I have access to the gpu without running a docker? If not, does that mean I have to use a prebuilt docker image to eventually speed up ML?

–Chris