Issues with VRAM allocation while fine tuning LLM

whibbard · September 15, 2025, 7:32pm

Hello. I am experiencing issues with VRAM allocation while fine tuning LLM. I have a system with two NVIDIA GeForce RTX 5070 GPUs. The GPUs are capable of doing graphics acceleration and running LLM inferences, but during training the VRAM is not being allocated as expected. The GPUs are recognized and functional for tasks like graphics rendering and LLM inference. VRAM allocation works as expected for running LLM inference and gaming. Some basic info on my environment, I am using PyTorch, both GPUs show VRAM utilization with nvidia-smi, using training library NCCL, and I am attempting to do data parallelism as well as model parallelism. The Driver version installed is 570.169. The CUDA version installed is 12.8.

Any & all help is appreciated.

morgwai666 · September 16, 2025, 7:20am

you may get more attention posting here: CUDA - NVIDIA Developer Forums
you will need to provide waaay more details though…

diego5483gdv · April 13, 2026, 4:36pm

Hi Tom, and thank you for the warm welcome!

I’m excited to share this vision with the community. The HPR (Hybrid Predictive Rendering) Architecture was born from a logical analysis of where hardware is heading—specifically looking at how NVIDIA’s new CPUs can act as the perfect ‘logistics engine’ for current and future GPUs.

The core idea is to move beyond brute force and use the NPU’s intelligence to manage data flow, making technologies like DLSS 5 and Neural Texture Compression even more seamless within a closed, proprietary ecosystem.

I am currently refining the technical documentation. I’m open to discussing the high-level logic here, but I’d also welcome a more private channel for the deeper architectural specifics when the time is right.

Looking forward to the feedback from the community!

Topic		Replies	Views
Shared Vram on linux --- super huge problem Linux	6	1554	March 17, 2026
With the same model and vLLM image, GB10 uses more VRAM than x86 + GPU DGX Spark / GB10	2	300	December 18, 2025
Multi GPU scaling using 4 A5500 GPU - Hardware cuda	7	1199	July 7, 2023
Driver support level for VRAM offload to system RAM Linux	3	2521	May 18, 2025
NVDEC/NVENC VRAM allocation differences between different GPUs Video Processing & Optical Flow	2	1888	October 30, 2018
[Architectural Review] RAG Blueprint for Air-Gapped Enterprise Environment on RTX 6000 Blackwell NVIDIA Blueprints llama-31-70b-instruct , llama , blueprints	1	98	April 17, 2026
NVIDIA Nemotron 3 Nano NVFP4 extremely slow on dual-Blackwell 32GB VRAM system NVIDIA Nemotron cuda , jetson , nemotron	2	392	February 7, 2026
LMStudio Error: Cannot obtain free VRAM bytes for GPU0: NVIDIA GB10 DGX Spark / GB10	6	481	December 1, 2025
vLLM v0.8.4 shows UVM GPU1 BH process with high utilization CUDA Programming and Performance	7	660	April 25, 2025
Introducing HPR Architecture: Optimizing VRAM through Behavioral AI and Logistics Forum Feedback ai , hyper-parameter-optimization-hpo , architecture-and-design , dlss	2	64	April 13, 2026

Issues with VRAM allocation while fine tuning LLM

Related topics