Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

jwitsoe · January 9, 2026, 4:58pm

Originally published at: Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time | NVIDIA Technical Blog

We keep seeing LLMs with larger context windows in the news, along with promises that they can hold entire conversation histories, volumes of books, or multiple codebases in view at once. And yet, these models still repeat the same mistakes. We still have to copy and paste the earlier context back into the chat for…

wiiiktor · January 14, 2026, 3:38am

Answering to context window information in production reality (the basic scenario: I conversate with a chatbot) requires speed - how fast is this TTT process that you propose? Can it happen “in the background”, during my conversation - let’s assume, the chat grows by a few pages every minute, during one hour-long session?

sankhasubhra.dey · February 14, 2026, 2:05pm

This is great. I’ve repeatedly seen this in enterprise deployments where users keep asking the same question over and over and we keep sending in the same context over and over. This would mean that the model can actually answer questions without needing to retrieve, unless the question was new.

One angle you didn’t write about but would be crucial for enterprise deployments: would it be possible to “draw” tenant boundaries within the model so learnings and data from one customer don’t leak into the answers we give to another customer, while still distilling common patterns across customers into the model’s weights?

Topic		Replies	Views
LLM 메모리의 재구성: 컨텍스트를 학습 데이터로 활용해 테스트 타임에 스스로 학습하는 모델 Technical Blog - South Korea	0	25	January 13, 2026
Accelerating Long-Context Model Training in JAX and XLA Technical Blog	0	22	February 3, 2026
Scaling to Millions of Tokens with Efficient Long-Context LLM Training Technical Blog	1	97	June 2, 2025
Dynamic Memory Compression Technical Blog	1	55	January 24, 2025
Mastering LLM Techniques: Training Technical Blog	0	509	November 16, 2023
Accelerating Long-Context Inference with Skip Softmax in NVIDIA TensorRT-LLM Technical Blog agentic-ai	0	57	December 16, 2025
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance Technical Blog	1	84	September 27, 2024
Mastering LLM Techniques: Inference Optimization Technical Blog	0	517	November 17, 2023
CPU-GPU 메모리 공유를 통한 대규모 LLM 추론 및 KV 캐시 오프로드 가속화 Technical Blog - South Korea llama	1	63	September 9, 2025
NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support Technical Blog	2	80	November 22, 2024

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

Related topics