Model Folding A Unified Approach to Post-training Compression and Efficient Pre-training

Research output: Contribution to conferencePosterpeer-review

Abstract

Scaling up model width and depth has delivered strong accuracy but at rising compute, memory, and energy costs across training and deployment. Classical efficiency tools (pruning, quantization, low-rank) reduce these costs but often require data/calibration, fine-tuning, or face representation bottlenecks at high compression. A complementary angle is to exploit an empirical regularity of modern training: SGD tends to learn repeated and highly similar channels/heads. We ask: can we reduce true compute while preserving more representation ability and full-width interfaces, by explicitly reusing such redundancy, instead of zeroing neurons out? This observation is the key to our approach we term model folding—a unifying perspective that decouples interface width from the amount of real computation.
Original languageEnglish
Publication statusPublished - 28 Oct 2025

Keywords

  • Machine Learning
  • LLMs
  • model compression

Fields of Expertise

  • Information, Communication & Computing

Fingerprint

Dive into the research topics of 'Model Folding A Unified Approach to Post-training Compression and Efficient Pre-training'. Together they form a unique fingerprint.

Cite this