I updated my server to Ubuntu 20.04 last week and was forced to update from CUDA 10.2 to 11.2 in that step. And obviously this included a new bug in the the memory management. To sum it up, I use multiple models for a CNN based neural networks, due to this problems I deactivated cudnn, so I’m talking ONLY about CUDA 11.2-1. And this version seems to allocate random amounts of memory to the layers O_O
For example:
- Model: 736x736 neural network, 104 Layers
- Model: 320x320 neural network, 73 Layers
The 1. Model allocated (yesterday) 3.7 GB of memory, the 2. Model allocated 4.1 GB of memory. With 10.2 the 2. Model allocated about 1 GB. I debugged my code for more than 5 hours and I can not find any error on my part.
I tested it again today and this time I get different numbers for the memory with exactly the same code and also I get different numbers on an RTX 3090 than on a RTX 2080.
Later I will post some detailed information, I wanted to check these new effects first. But for me this looks like 11.2 can not be trusted for my kind of applications :(.