I’m curious to know whether I can use four RTX A5500 together […] to fine tune LLM?
The short answer to that part is “Yes”. And if programmed correctly you can also benefit from all available VRAM on all GPUs.
But I suspect your thinking is that you would like to load all your LLM parameters into one continuous chunk of VRAM and let your model run freely to fine-tune it. That is not possible.
@MarkusHoHo sorry to bother you again. Nvidia page (RTX A5500 Graphics Card | NVIDIA) says that by using NVIDIA NVLink Bridge, I can get up to 112 gigabytes per second (GB/s) of bandwidth and a combined 48GB of GDDR6 memory to tackle memory-intensive workloads. So, I thought at least 48GB combined memory is achievable. So, do you think that it won’t be applicable for LLM fine tuning ? I’ll be mostly using Pytorch as deep learning programming framework.
Yes, it is applicable. You will be able to utilize all memory for your Deep Learning models. And you will have the bandwidth improvements of NVLink.
PyTorch can run with CUDA and as such you will be able to use memory management that way.
The only thing to understand is that multiple GPUs and their VRAM act as if you would have one GPU only with much bigger VRAM. Workloads need to be distributed an memory usage will be batched. But depending on the framework and how your code/model is designed, it might even be transparent to you or require only small adjustments to your code.
I am not a CUDA or multi-GPU DL expert, but there are plenty of tutorials “out there” that can get you started.