Projects per year
Abstract
Scaling up model width and depth has delivered strong accuracy but at rising compute, memory, and energy costs across training and deployment. Classical efficiency tools (pruning, quantization, low-rank) reduce these costs but often require data/calibration, fine-tuning, or face representation bottlenecks at high compression. A complementary angle is to exploit an empirical regularity of modern training: SGD tends to learn repeated and highly similar channels/heads. We ask: can we reduce true compute while preserving more representation ability and full-width interfaces, by explicitly reusing such redundancy, instead of zeroing neurons out? This observation is the key to our approach we term model folding—a unifying perspective that decouples interface width from the amount of real computation.
| Original language | English |
|---|---|
| Publication status | Published - 28 Oct 2025 |
Keywords
- Machine Learning
- LLMs
- model compression
Fields of Expertise
- Information, Communication & Computing
Fingerprint
Dive into the research topics of 'Model Folding A Unified Approach to Post-training Compression and Efficient Pre-training'. Together they form a unique fingerprint.Projects
- 1 Active
-
FLUID-AI - Immersive intelligence transfer
Saukh, O. (Project manager on research unit)
1/04/25 → 31/03/27
Project: Research project