Some LLMs require large amount of GPU memory. I am considering using a K80 card, which has two GPU modules. Could someone please clarify if the 24Gb RAM is shared between GPUs or is it dedicated RAM divided between the GPUs? I.e. - will I be able to use the entire 24Gb RAM with one GPU?
See this previous thread:
Note that CUDA 12.x removed support for sm_35
and sm_37
meaning the Tesla K80 is no longer supported.
Note also that this is a passively cooled card. Unless you run this in a server enclosure, you will need to construct your own cooling solution.
There are cooling solutions for this card. But since one GPU can only use 12GB RAM, this is not very useful anyway.