Tag: reading-list
All the articles with the tag "reading-list".
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Published:DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
Published:IntellAgent, an open-source multi-agent framework, is presented as a novel solution for comprehensively evaluating conversational AI systems. It addresses limitations of existing methods by automating the generation of diverse, realistic, policy-driven scenarios using a graph-based policy model. The framework simulates interactions between user and chatbot agents, providing fine-grained performance diagnostics and actionable insights for optimization. IntellAgent's modular design promotes reproducibility and collaboration, bridging the gap between research and deployment. Its effectiveness is demonstrated through experiments comparing its results to those of established benchmarks like τ-bench.
PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts
Published:This document reviews the key concepts and findings from two sources related to PromptWizard, a prompt optimization framework developed by Microsoft Research. These sources highlight the limitations of existing prompt optimization techniques, particularly for closed-source Large Language Models (LLMs), and introduce PromptWizard as a novel, iterative approach that leverages feedback and iterative refinement.