SMT Optimizer: Emergent Structural Redundancy Discovery in Transformer Attention Mechanisms

Multitec · May 17, 2026, 4:25am

Hi NVIDIA developer community,

Hi, I’m Samuel, an NVIDIA Inception member. I’d like to share two recent works on extreme compression for Edge AI and LLM optimization.

Work 1 — SMT V10: Adaptive Sparse Training [https://doi.org/10.5281/zenodo.20150258] An adaptive optimizer that applies dynamic gradient masking layer-by-layer during training.

Key PoC result: Achieved 93.79% sparsity with only 3.67% accuracy loss vs. Adam (tested on Fashion-MNIST as a baseline proof-of-concept), maintaining stability across 3 random seeds.
Designed with neuromorphic hardware and Edge AI in mind — the extreme compression is where the real gain would be localized on sparse-native silicon.

Work 2 — Crystal-SMT: Automatic Attention Bias Discovery [https://doi.org/10.5281/zenodo.20219077] A structural analysis tool that autonomously identifies redundant parameters during the training phase.

Key finding: The optimizer consistently eliminates ~36% of QKV attention biases while strictly preserving weight matrices (across 10 random seeds with ±0.78% variance).
This mirrors the architectural decisions made manually in state-of-the-art models like LLaMA and PaLM, but discovered emergently through gradient dynamics rather than manual ablation.

I’m looking to connect with engineers working on sparse tensor operations, neuromorphic computing, or LLM compression. Happy to share the code and discuss potential validation strategies on NVIDIA hardware.

Best regards, Samuel David López Armenta

Topic		Replies	Views
Accelerating Inference with Sparsity Using the NVIDIA Ampere Architecture and NVIDIA TensorRT Technical Blog	13	3083	June 2, 2023
TensorRT/Faster Transformer for GPT2/MT-NLG with Sparsity TensorRT	4	1063	April 3, 2023
NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support Technical Blog	1	99	November 22, 2024
Seeking technical feedback on SHBF: sparse long-context LLM inference with based candidate selection CUDA Programming and Performance	2	100	June 22, 2026
Mastering LLM Techniques: Inference Optimization Technical Blog	0	557	November 17, 2023
Overcoming Compute and Memory Bottlenecks with FlashAttention-4 on NVIDIA Blackwell Technical Blog llama , deepseek	0	226	January 22, 2026
Mastering LLM Techniques: Training Technical Blog	0	534	November 16, 2023
Accelerate Generative AI Inference Performance with NVIDIA TensorRT Model Optimizer, Now Publicly Available Technical Blog	3	386	July 16, 2024
2:4 sparsity doesnot improve inference performance on RTX 3090 TensorRT tensorrt	15	3766	June 4, 2026
Hymba Hybrid-Head Architecture Boosts Small Language Model Performance Technical Blog	0	73	November 22, 2024

SMT Optimizer: Emergent Structural Redundancy Discovery in Transformer Attention Mechanisms

Related topics