Run LLM in K80

Some LLMs require large amount of GPU memory. I am considering using a K80 card, which has two GPU modules. Could someone please clarify if the 24Gb RAM is shared between GPUs or is it dedicated RAM divided between the GPUs? I.e. - will I be able to use the entire 24Gb RAM with one GPU?

This is what ChatGPT is saying. Is this correct?

See this previous thread:

Note that CUDA 12.x removed support for sm_35 and sm_37 meaning the Tesla K80 is no longer supported.

Note also that this is a passively cooled card. Unless you run this in a server enclosure, you will need to construct your own cooling solution.

There are cooling solutions for this card. But since one GPU can only use 12GB RAM, this is not very useful anyway.