CUDA Memory Management problem introduced in 11.2

I updated my server to Ubuntu 20.04 last week and was forced to update from CUDA 10.2 to 11.2 in that step. And obviously this included a new bug in the the memory management. To sum it up, I use multiple models for a CNN based neural networks, due to this problems I deactivated cudnn, so I’m talking ONLY about CUDA 11.2-1. And this version seems to allocate random amounts of memory to the layers O_O

For example:

  1. Model: 736x736 neural network, 104 Layers
  2. Model: 320x320 neural network, 73 Layers

The 1. Model allocated (yesterday) 3.7 GB of memory, the 2. Model allocated 4.1 GB of memory. With 10.2 the 2. Model allocated about 1 GB. I debugged my code for more than 5 hours and I can not find any error on my part.
I tested it again today and this time I get different numbers for the memory with exactly the same code and also I get different numbers on an RTX 3090 than on a RTX 2080.
Later I will post some detailed information, I wanted to check these new effects first. But for me this looks like 11.2 can not be trusted for my kind of applications :(.

In the meantime, you should be able to install CUDA 10.2 along side the Ubuntu apt-based install. Just use the runfile method (and skip the driver installation):

If you want to switch between the two, just change the symlink that points to /usr/local/cuda

One point to note, is that CUDA 10.2 doesn’t have native support for the 30xx GPUs, so those depend on JIT compilation. Not experienced in CUDA DL approaches so I can’t specifically comment on that.