Strange behavior of titan v

qchenao · June 16, 2019, 4:14am

When I tried to rerun code exactly as https://github.com/traveller59/second.pytorch, I can set batch size to 6 for my 2080ti. But when I switch to titan v, even batch size of 1 could result in cuda out of memory. By enabling mix precision training, the batch size could be 1 by maximum for titan v. I tested both my titan v and 2080ti with pytorch mnist examples, both of them worked fine and the memory consumption is normal. Then there must be some operation in https://github.com/traveller59/second.pytorch that cause huge memory consumption in titan v but not in 2080ti. Does anyone have any ideas what could cause different behaviors between 2080ti and titan v?

thomas.p.16 · August 25, 2019, 4:10pm

I had a similar issue but I don’t know what fixed. For other reasons I upgraded from Ubuntu 16.04 to 18.04. In the process I went with the latest version of everything. After this my Titan V problems seemed to go away. My symptoms were extremely slow training and at least 8x the memory usage compare to my Titan Xp. Sorry I wasn’t more help but maybe upgrade and update and see if that helps. Also note at the time and maybe still true pytorch installs it own CUDA like v9 and if you are doing something with like mixed precision I believe you need V10. Good luck,.