Since cudaHostAlloc() is a thin wrapper around appropriate OS API calls, you are at the mercy of whatever limits the OS imposes. 5-10GB may be allowed on system with tons of system memory, I doubt it will work if the system has only 8 GB.
I am not aware of any OS calls that would tell you upfront how much memory is pinnable. It may well depend of the allocation sizes, due to memory space fragmentation.
I would be somewhat surprised if there is a real dependency on driver version, unless the driver changes the sequence of OS calls it performs between version (this could change between major versions, but seems unlikely otherwise). I would expect this to be dependent on the state of the OS, so the amount of memory pinnable may differ between uptime 5 minutes and uptime 400 days, on the same system.
On Linux there are system debug utilities (strace? dtrace?) that should give you visibility into the calls made by the driver to pin the memory (I think mmap() on Linux but I could be wrong; it’s been many years since I looked at it). With that knowledge you might then inquire in an OS-specific support forum whether there is any way of knowing a-priori whether the call will succeed.