https://arxiv.org/abs/2203.02155 Training language models to follow instructions with human feedbackMaking language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aliarxiv.org Abstraction. 언어 모델의 크기를 키운다고 해서,..
2025.01.13 - [[Deep daiv.]/[Deep daiv.] NLP] - [Deep daiv.] NLP, 논문 리뷰 - A Survey on LLM-as-a-Judge [Deep daiv.] NLP, 논문 리뷰 - A Survey on LLM-as-a-Judgehttps://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-JudgeAccurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Lan..
https://arxiv.org/abs/2411.15594 A Survey on LLM-as-a-JudgeAccurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models (LLMs) have achieved remarkable success across diverse darxiv.org Abstract. 이 논문에서는 다양한 분야에서 의사결정을 위해 정확하고 일관된 평가가 중요하지만, 주관성, 가변성, 규모의 문제로..
https://arxiv.org/abs/2305.14314 QLoRA: Efficient Finetuning of Quantized LLMsWe present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quanarxiv.org Abstract. 해당 논문에서 제시하는 QLoRA는 65B 파라미터 규모의 모델(예: LLaMA..
Optimizer란? Optimizer는 머신러닝 혹은 딥러닝 모델이 주어진 목표 함수를 최소화(혹은 최대화)하도록 모델 파라미터(가중치, 편향 등)를 업데이트하는 절차나 알고리즘을 말합니다. 예를 들어 모델의 손실 함수(Loss Function)를 최소화하는 방향으로 파라미터를 변경하는 것입니다. Optimizer는 학습의 핵심입니다. 모델의 학습 과정은 결국 Optima(최적 해)를 찾아가는 과정입니다. 따라서 어떤 Optimizer를 쓰느냐에 따라 학습 속도, 수렴 안정성, 최종 성능 등이 크게 달라질 수 있습니다. 또한 매우 다양한 종류의 Optimizer가 존재합니다. Gradient Descent(경사 하강법)을 기본으로, 여러 변형 알고리즘(Momentum, Adam, RMSProp 등)이 ..
https://arxiv.org/abs/2106.09685 LoRA: Low-Rank Adaptation of Large Language ModelsAn important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes learxiv.org2025.01.05 - [[Deep daiv.]/[Deep daiv.] NLP] - [De..
https://arxiv.org/abs/2005.00247 AdapterFusion: Non-Destructive Task Composition for Transfer LearningSequential fine-tuning and multi-task learning are methods aiming to incorporate knowledge from multiple tasks; however, they suffer from catastrophic forgetting and difficulties in dataset balancing. To address these shortcomings, we propose AdapterFusionarxiv.org2025.01.04 - [[Deep daiv.]/[Dee..
Prompt? 2024.11.29 - [[Deep daiv.]/[Deep daiv.] NLP] - [Deep daiv.] NLP, 논문 리뷰 - Language Models are Few-Shot Learners (GPT-3) [Deep daiv.] NLP, 논문 리뷰 - Language Models are Few-Shot Learners (GPT-3)https://arxiv.org/abs/2005.14165 Language Models are Few-Shot LearnersRecent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed..