[CoIn] 논문 리뷰 | GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints (Ainslie et al., 2023)

[CoIn] 논문 리뷰 \| Root Mean Square Layer Normalization (RMSNorm) (0)	2026.01.01
[CoIn] 논문 리뷰 \| Swish: A Self-Gated Activation Function & Language Modeling with Gated Convolutional Networks(GLU) & SwiGLU Activation Function (0)	2025.12.30
[CoIn] 논문 리뷰 \| Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022) (0)	2025.12.29
[CoIn] 논문 리뷰 \| Mixtral of Experts & DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models (0)	2025.12.28
[CoIn] Mixture of Experts - Overview (0)	2025.12.26

Abstract.