AI Study

This content mostly features papers relating to AI Engineering. The inspiration, and many of the papers, have been selected from Latent Space's article, The 2025 AI Engineer Reading List. Papers have been uploaded to Google's NotebookLM from which a Briefing document is generated, which serves as the content of the post. RSS Feed An audio overview is also created, which serves as a podcast. RSS Podcast Feed

Recent Posts

How much do language models memorize?
Published:Jun 14, 2025
Explores the concept of memorization in large language models (LLMs), introducing a novel method to quantify it and distinguish it from generalization. The authors define model capacity and investigate its relationship with dataset size, training dynamics, and membership inference.
The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers
Published:Mar 30, 2025
This paper investigates the impact of Generative AI (GenAI) tools on critical thinking skills and practices among knowledge workers. Through a survey of 319 participants who shared 936 real-world examples of using GenAI in their work, the study explores when and how critical thinking is enacted and how GenAI affects the effort involved.
Executable Code Actions Elicit Better LLM Agents
Published:Mar 23, 2025
This briefing document summarizes the key findings and contributions of the paper "Executable Code Actions Elicit Better LLM Agents." The paper introduces CodeAct, a novel approach that consolidates Large Language Model (LLM) agent actions into a unified action space using executable Python code. By integrating LLMs with a Python interpreter, CodeAct allows for dynamic action revision and the emission of new actions based on real-time feedback from the environment.
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Published:Mar 15, 2025
This paper addresses a critical vulnerability in modern Large Language Models (LLMs): their susceptibility to prompt injection attacks, jailbreaks, and system prompt extractions. The authors argue that this stems from the lack of a clear instruction hierarchy, where LLMs treat instructions from application developers (system messages) with the same priority as those from potentially malicious users or third-party sources.

All Posts

Recent Posts

How much do language models memorize?

The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers

Executable Code Actions Elicit Better LLM Agents

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions