I’ve explained case 2. All other cases really depend on the other things which happen on the VRAM on a case by case basis. For example it’s unclear how much memory your CUDA simulation might block at the same time, etc. (You can let nvidia-smi.exe dump memory statistics while running your app to analyze that.)
Yes, there can be fragmentation. The GPU can also address PCI-E memory to some amount and overall workloads can be bigger than the installed VRAM, and yes there is also the possibility to swap to make things resident.
But in the end bigger boards which fit the application requirements are the only viable solution if you ever get an abort message with “Request for more GPU memory than is available” from the drivers.