Tag: reading-list

All the articles with the tag "reading-list".

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Published:Feb 8, 2025
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems
Published:Feb 1, 2025
IntellAgent, an open-source multi-agent framework, is presented as a novel solution for comprehensively evaluating conversational AI systems. It addresses limitations of existing methods by automating the generation of diverse, realistic, policy-driven scenarios using a graph-based policy model. The framework simulates interactions between user and chatbot agents, providing fine-grained performance diagnostics and actionable insights for optimization. IntellAgent's modular design promotes reproducibility and collaboration, bridging the gap between research and deployment. Its effectiveness is demonstrated through experiments comparing its results to those of established benchmarks like τ-bench.
PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts
Published:Jan 29, 2025
This document reviews the key concepts and findings from two sources related to PromptWizard, a prompt optimization framework developed by Microsoft Research. These sources highlight the limitations of existing prompt optimization techniques, particularly for closed-source Large Language Models (LLMs), and introduce PromptWizard as a novel, iterative approach that leverages feedback and iterative refinement.

Tag: reading-list

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts