This content mostly features papers relating to AI Engineering. The inspiration, and many of the papers, have been selected from Latent Space's article, The 2025 AI Engineer Reading List. Papers have been uploaded to Google's NotebookLM from which a Briefing document is generated, which serves as the content of the post. RSS Feed An audio overview is also created, which serves as a podcast. RSS Podcast Feed
Recent Posts
How much do language models memorize?
Published:Explores the concept of memorization in large language models (LLMs), introducing a novel method to quantify it and distinguish it from generalization. The authors define model capacity and investigate its relationship with dataset size, training dynamics, and membership inference.
The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
Published:This paper investigates the impact of Generative AI (GenAI) tools on critical thinking skills and practices among knowledge workers. Through a survey of 319 participants who shared 936 real-world examples of using GenAI in their work, the study explores when and how critical thinking is enacted and how GenAI affects the effort involved.
Executable Code Actions Elicit Better LLM Agents
Published:This briefing document summarizes the key findings and contributions of the paper "Executable Code Actions Elicit Better LLM Agents." The paper introduces CodeAct, a novel approach that consolidates Large Language Model (LLM) agent actions into a unified action space using executable Python code. By integrating LLMs with a Python interpreter, CodeAct allows for dynamic action revision and the emission of new actions based on real-time feedback from the environment.
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Published:This paper addresses a critical vulnerability in modern Large Language Models (LLMs): their susceptibility to prompt injection attacks, jailbreaks, and system prompt extractions. The authors argue that this stems from the lack of a clear instruction hierarchy, where LLMs treat instructions from application developers (system messages) with the same priority as those from potentially malicious users or third-party sources.