I have 2 titan Xp on my pc and I use python2.7 and tensorflow1.12.0 to do the training on Ubuntu 16.04. Sometimes I need 2 gpus’ memory(24G) to do my work and it used to work fine. However, for some reason, I have to completely wipe off the pc and reinstall everything. This time it only allocates gpu memory to one of the gpus and never uses both. It fully uses one gpu memory and give me OOM error while the other gpu memory allocation is 0. Cuda version is 9.0. Cudnn7.4. “./bin/ppc64le/linux/release/deviceQuery” showed pass.
My code was same as before.
when I set os.environ[“CUDA_VISIBLE_DEVICES”] = “0” or “0,1” or not write this line of code, it runs the gpu:0 without any problem.
When I set os.environ[“CUDA_VISIBLE_DEVICES”] = “1” or “1,0”, it runs gpu:1 perfectly.
Is this cuda problem? how can I allocate gpus memory to multiple gpus?