WSL2 / RTX3070 : running cuda samples and containers errors

CPU : AMD RYZEN 5 5600X
GPU : Gigabyte RTX 3070 EAGLE 8G
OS : Windows 10 Insider Preview version 21296.1010
Ubuntu 20.04
kernel : Linux version 5.4.72-microsoft-standard-WSL2
cuda-toolkit : 11.2
NVIDIA driver: 465.21 (windows only)

Description of the issues

I have follow the installation guide provided by NVIDIA at 1. NVIDIA GPU Accelerated Computing on WSL 2 — CUDA on WSL 12.3 documentation but I encounter some issues about running cuda applications and containers which use my GPU.

  • when I run the test ‘./BlackScholes located in /usr/local/cuda/samples/4_Finance/BlackScholes’
    I got this error :

[./BlackScholes] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:779 code=1(cudaErrorInvalidValue) “cudaGetDeviceCount(&device_count)”

I made other tests :

[./binomialOptions] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:779 code=1(cudaErrorInvalidValue) “cudaGetDeviceCount(&device_count)”

./quasirandomGenerator Starting…
Allocating GPU memory…
CUDA error at quasirandomGenerator.cpp:69 code=1(cudaErrorInvalidValue) “cudaMalloc((void **)&d_Output, QRNG_DIMENSIONS * N * sizeof(float))”

However, I am able to use my GPU with the 3 containers given in example but I still have some errors (cf. logs) for container jupyter and deep learning framework:

- Simple CUDA Containers:

> Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
>         -fullscreen       (run n-body simulation in fullscreen mode)
>         -fp64             (use double precision floating point values for simulation)
>         -hostmem          (stores simulation data in host memory)
>         -benchmark        (run benchmark to measure performance)
>         -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
>         -device=<d>       (where d=0,1,2.... for the CUDA device to use)
>         -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
>         -compare          (compares simulation results running once on the default GPU and once on the CPU)
>         -cpu              (run n-body simulation on the CPU)
>         -tipsy=<file.bin> (load a tipsy model file for simulation)
> 
> NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
> 
> > Windowed mode
> > Simulation data stored in video memory
> > Single precision floating point simulation
> > 1 Devices used for simulation
> MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
> GPU Device 0: "GeForce RTX 3070" with compute capability 8.6
> 
> > Compute 8.6 CUDA device: [GeForce RTX 3070]
> 47104 bodies, total time for 10 iterations: 40.226 ms
> = 551.583 billion interactions per second
> = 11031.663 single-precision GFLOP/s at 20 flops per interaction

All seems ok here.

- Jupyter Notebooks container, notebook /tensorflow-tutorials/regression:

2021-01-27 08:28:27.465606: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2021-01-27 08:28:27.475910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2021-01-27 08:28:59.155517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-01-27 08:28:59.26358: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.262495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2b:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-27 08:28:59.262531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-01-27 08:28:59.262564: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-27 08:28:59.273384: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-01-27 08:28:59.275782: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-01-27 08:28:59.294179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-01-27 08:28:59.296734: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-01-27 08:28:59.296773: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-27 08:28:59.297064: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.297358: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.297501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-01-27 08:28:59.297891: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-01-27 08:28:59.304705: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700005000 Hz
2021-01-27 08:28:59.305926: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x60c95c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-27 08:28:59.305947: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-27 08:28:59.459839: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.460073: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x68d9800 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-27 08:28:59.460112: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
2021-01-27 08:28:59.460417: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.460571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2b:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-27 08:28:59.460628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-01-27 08:28:59.460658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-27 08:28:59.460696: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-01-27 08:28:59.460729: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-01-27 08:28:59.460750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-01-27 08:28:59.460769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-01-27 08:28:59.460784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-27 08:28:59.460987: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.461317: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:28:59.461435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-01-27 08:28:59.461946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[I 08:29:51.726 NotebookApp] Saving file at /tensorflow-tutorials/regression.ipynb
2021-01-27 08:31:49.088439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-27 08:31:49.088468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-01-27 08:31:49.088484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-01-27 08:31:49.089324: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:31:49.089428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-01-27 08:31:49.089727: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:31:49.089887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6711 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:2b:00.0, compute capability: 8.6)

NB : I got the same errors on other notebooks.

- Deep learning framework container :

2021-01-27 08:51:09.067071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 08:51:09.502477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2021-01-27 08:51:09.502945: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
PY 3.6.9 (default, Nov  7 2019, 10:44:02)
[GCC 8.3.0]
TF 2.1.0
Script arguments:
  --image_width=224
  --image_height=224
  --distort_color=False
  --momentum=0.9
  --loss_scale=128.0
  --image_format=channels_last
  --data_dir=None
  --data_idx_dir=None
  --batch_size=256
  --num_iter=300
  --iter_unit=batch
  --log_dir=None
  --export_dir=None
  --tensorboard_dir=None
  --display_every=10
  --precision=fp16
  --dali_mode=None
  --use_xla=False
  --predict=False
2021-01-27 08:51:09.914723: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-01-27 08:51:10.021077: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.021197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2b:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-27 08:51:10.021227: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 08:51:10.021267: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-27 08:51:10.022271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-01-27 08:51:10.022433: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-01-27 08:51:10.023287: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-01-27 08:51:10.023878: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-01-27 08:51:10.023917: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-27 08:51:10.024119: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.024419: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.024539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-01-27 08:51:10.041583: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700005000 Hz
2021-01-27 08:51:10.044334: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b77f50 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-27 08:51:10.044357: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-27 08:51:10.185680: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.185958: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b254a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-01-27 08:51:10.185970: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 3070, Compute Capability 8.6
2021-01-27 08:51:10.186294: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.186483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:2b:00.0 name: GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-01-27 08:51:10.186529: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 08:51:10.186565: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-27 08:51:10.186635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-01-27 08:51:10.186692: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-01-27 08:51:10.186735: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-01-27 08:51:10.186749: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-01-27 08:51:10.186760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-27 08:51:10.187069: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.187394: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:51:10.187524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-01-27 08:51:10.187573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-01-27 08:57:53.794035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-27 08:57:53.794077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-01-27 08:57:53.794095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-01-27 08:57:53.794968: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:57:53.795118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1324] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-01-27 08:57:53.795577: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:2b:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-01-27 08:57:53.795758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6507 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3070, pci bus id: 0000:2b:00.0, compute capability: 8.6)
WARNING:tensorflow:Expected a shuffled dataset but input dataset `x` is not shuffled. Please invoke `shuffle()` on input dataset.
2021-01-27 08:58:01.869126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-27 08:59:23.191813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-27 09:12:42.968472: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. This message will be only logged once.
2021-01-27 09:12:55.253071: W tensorflow/core/common_runtime/bfc_allocator.cc:424] Allocator (GPU_0_bfc) ran out of memory trying to allocate 196.00MiB (rounded to 205520896).  Current allocation summary follows.
2021-01-27 09:12:55.253141: I tensorflow/core/common_runtime/bfc_allocator.cc:894] BFCAllocator dump for GPU_0_bfc
2021-01-27 09:12:55.253166: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (256):   Total Chunks: 392, Chunks in use: 388. 98.0KiB allocated for chunks. 97.0KiB in use in bin. 15.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253185: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (512):   Total Chunks: 49, Chunks in use: 48. 24.8KiB allocated for chunks. 24.0KiB in use in bin. 24.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253203: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1024):  Total Chunks: 109, Chunks in use: 107. 110.8KiB allocated for chunks. 108.0KiB in use in bin. 107.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253226: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2048):  Total Chunks: 72, Chunks in use: 72. 144.0KiB allocated for chunks. 144.0KiB in use in bin. 144.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253234: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4096):  Total Chunks: 46, Chunks in use: 46. 186.5KiB allocated for chunks. 186.5KiB in use in bin. 183.6KiB client-requested in use in bin.
2021-01-27 09:12:55.253240: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8192):  Total Chunks: 25, Chunks in use: 25. 200.0KiB allocated for chunks. 200.0KiB in use in bin. 200.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253246: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16384):         Total Chunks: 5, Chunks in use: 4. 83.0KiB allocated for chunks. 64.0KiB in use in bin. 64.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253262: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (32768):         Total Chunks: 11, Chunks in use: 10. 395.8KiB allocated for chunks. 339.0KiB in use in bin. 339.0KiB client-requested in use in bin.
2021-01-27 09:12:55.253270: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (65536):         Total Chunks: 29, Chunks in use: 24. 1.86MiB allocated for chunks. 1.55MiB in use in bin. 1.52MiB client-requested in use in bin.
2021-01-27 09:12:55.253292: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (131072):        Total Chunks: 23, Chunks in use: 19. 3.13MiB allocated for chunks. 2.59MiB in use in bin. 2.50MiB client-requested in use in bin.
2021-01-27 09:12:55.253305: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (262144):        Total Chunks: 38, Chunks in use: 31. 10.53MiB allocated for chunks. 8.78MiB in use in bin. 7.66MiB client-requested in use in bin.
2021-01-27 09:12:55.253332: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (524288):        Total Chunks: 38, Chunks in use: 32. 20.50MiB allocated for chunks. 17.25MiB in use in bin. 16.81MiB client-requested in use in bin.
2021-01-27 09:12:55.253343: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (1048576):       Total Chunks: 54, Chunks in use: 43. 56.70MiB allocated for chunks. 45.70MiB in use in bin. 43.75MiB client-requested in use in bin.
2021-01-27 09:12:55.253355: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (2097152):       Total Chunks: 40, Chunks in use: 32. 90.66MiB allocated for chunks. 73.16MiB in use in bin. 70.41MiB client-requested in use in bin.
2021-01-27 09:12:55.253382: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (4194304):       Total Chunks: 32, Chunks in use: 26. 153.20MiB allocated for chunks. 124.25MiB in use in bin. 119.00MiB client-requested in use in bin.
2021-01-27 09:12:55.253409: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (8388608):       Total Chunks: 18, Chunks in use: 14. 157.02MiB allocated for chunks. 122.00MiB in use in bin. 122.00MiB client-requested in use in bin.
2021-01-27 09:12:55.253440: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (16777216):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-01-27 09:12:55.253460: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (33554432):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-01-27 09:12:55.253471: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (67108864):      Total Chunks: 12, Chunks in use: 12. 1.10GiB allocated for chunks. 1.10GiB in use in bin. 1.10GiB client-requested in use in bin.
2021-01-27 09:12:55.253481: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (134217728):     Total Chunks: 6, Chunks in use: 6. 1.07GiB allocated for chunks. 1.07GiB in use in bin. 833.00MiB client-requested in use in bin.
2021-01-27 09:12:55.253507: I tensorflow/core/common_runtime/bfc_allocator.cc:901] Bin (268435456):     Total Chunks: 10, Chunks in use: 10. 3.70GiB allocated for chunks. 3.70GiB in use in bin. 3.64GiB client-requested in use in bin.
2021-01-27 09:12:55.253536: I tensorflow/core/common_runtime/bfc_allocator.cc:917] Bin for 196.00MiB was 128.00MiB, Chunk State:
2021-01-27 09:12:55.253546: I tensorflow/core/common_runtime/bfc_allocator.cc:930] Next region of size 1048576
2021-01-27 09:12:55.253563: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 716140000 of size 1280 next 1
2021-01-27 09:12:55.253587: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 716140500 of size 256 next 3
2021-01-27 09:12:55.253597: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 716140600 of size 256 next 2
2021-01-27 09:12:55.253604: I tensorflow/core/common_runtime/bfc_allocator.cc:950] InUse at 716140700 of size 256 next 6
...
2021-01-27 09:12:55.267099: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 1 Chunks of size 269023744 totalling 256.56MiB
2021-01-27 09:12:55.267105: I tensorflow/core/common_runtime/bfc_allocator.cc:958] 9 Chunks of size 411041792 totalling 3.45GiB
2021-01-27 09:12:55.267117: I tensorflow/core/common_runtime/bfc_allocator.cc:962] Sum Total of in-use chunks: 6.26GiB
2021-01-27 09:12:55.267122: I tensorflow/core/common_runtime/bfc_allocator.cc:964] total_region_allocated_bytes_: 6823672320 memory_limit_: 6823672546 available bytes: 226 curr_region_allocation_bytes_: 8589934592
2021-01-27 09:12:55.267129: I tensorflow/core/common_runtime/bfc_allocator.cc:970] Stats:
Limit:                  6823672546
InUse:                  6720489984
MaxInUse:               6722587136
NumAllocs:                    2498
MaxAllocSize:            427819008

2021-01-27 09:12:55.267168: W tensorflow/core/common_runtime/bfc_allocator.cc:429] ****************************************************************************************************
2021-01-27 09:12:55.267202: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at fused_batch_norm_op.cc:1186 : Resource exhausted: OOM when allocating tensor with shape[256,56,56,128] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-01-27 09:12:55.267235: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[256,56,56,128] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node bn3a_branch2a/cond/then/_220/FusedBatchNormV3}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
  File "cnn/resnet.py", line 50, in <module>
    nvutils.train(resnet50, args)
  File "/workspace/nvidia-examples/cnn/nvutils/runner.py", line 216, in train
    initial_epoch=initial_epoch, **valid_params)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 694, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1123, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3727, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1551, in __call__
    return self._call_impl(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl
    return self._call_flat(args, self.captured_inputs, cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError:  OOM when allocating tensor with shape[256,56,56,128] and type half on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node bn3a_branch2a/cond/then/_220/FusedBatchNormV3}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_keras_scratch_graph_20352]

Function call stack:
keras_scratch_graph 

NB : run in ~ 24 mins

Does someone have any idea about these errors and warning?
For example, I saw that in the deep learning framework container the cuda version is 10.2, does this version of cuda support RTX3xxx ?
If not, the container are not usable for GPU with Ampere architecture ?

Thanks for your help,

Léonard

1 Like