[CoIn] 논문 리뷰 | Mixture-of-Depths: Dynamically allocating compute in transformer-based language models (Raposo et al., 2024)

[CoIn] 논문 리뷰 \| Gated Attention for Large Language Models: Non-linearity, Sparsity,and Attention-Sink-Free (Qiu et al., 2025) (0)	2026.01.21
[CoIn] 논문 리뷰 \| Sliding Window Attention Training for Efficient Large Language Models (Fu et al., 2025) (1)	2026.01.19
[CoIn] DeltaNet Explained (part 2) (0)	2026.01.07
[CoIn] DeltaNet Explained (Part 1) (0)	2026.01.04
[CoIn] 논문 리뷰 \| DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model (0)	2026.01.03

Abstract.