[CoIn] 논문 리뷰 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

[CoIn] DeltaNet Explained (part 2) (0)	2026.01.07
[CoIn] DeltaNet Explained (Part 1) (0)	2026.01.04
[CoIn] 논문 리뷰 \| Root Mean Square Layer Normalization (RMSNorm) (0)	2026.01.01
[CoIn] 논문 리뷰 \| Swish: A Self-Gated Activation Function & Language Modeling with Gated Convolutional Networks(GLU) & SwiGLU Activation Function (0)	2025.12.30
[CoIn] 논문 리뷰 \| GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints (Ainslie et al., 2023) (0)	2025.12.29

Introduction.