Posts
All the articles I've posted.
Magma: A Foundation Model for Multimodal AI Agents
Published:Magma is a multimodal agentic AI model that can generate text based on the input text and image. The model is designed for research purposes and aimed at knowledge-sharing and accelerating research in multimodal AI, in particular the multimodal agentic AI. The main innovation of this model lies on the introduction of two technical innovations: Set-of-Mark and Trace-of-Mark, and the leverage of a large amount of unlabeled video data to learn the spatial-temporal grounding and planning.
Retrieval Augmented Generation or Long-Context LLMs
Published:This document summarizes the findings of a comprehensive study comparing Retrieval Augmented Generation (RAG) and Long-Context (LC) Large Language Models (LLMs) for processing lengthy contexts. The study benchmarks both approaches across various public datasets using recent LLMs (Gemini-1.5-Pro, GPT-4O, and GPT-3.5-Turbo). The key finding is that LC models, when resourced sufficiently, generally outperform RAG in terms of average performance. However, RAG maintains a significant cost advantage due to the reduced input length to the LLM. Based on these observations, the study introduces SELF-ROUTE, a method that intelligently routes queries to either RAG or LC based on model self-reflection, significantly reducing computational costs while maintaining performance comparable to LC. The findings provide guidance for building long-context applications utilizing both RAG and LC.
International AI Safety Report 2025
Published:The purpose of this report is to help create a shared international understanding of risks from advanced AI and how they can be mitigated. To achieve this, this report focuses on general-purpose AI – or AI that can perform a wide variety of tasks – since this type of AI has advanced particularly rapidly in recent years and has been deployed widely by technology companies for a range of consumer and business purposes. The report synthesises the state of scientific understanding of general-purpose AI, with a focus on understanding and managing its risks.