- Authors: CHAIR Prof. Yoshua Bengio, SCIENTIFIC LEAD Sören Mindermann, Mila, LEAD WRITER Daniel Privitera
- Source: Excerpts from “International_AI_Safety_Report_2025_accessible_f.pdf”
Introduction
This research paper introduces Automatic Prompt Engineer (APE), an algorithm that uses large language models (LLMs) to automatically generate and select optimal prompts for various tasks. APE surpasses human performance in prompt engineering by treating instructions as “programs” and optimizing them through a search process guided by LLMs. The researchers demonstrate APE’s effectiveness across numerous benchmarks, including instruction induction and BIG-Bench tasks, showcasing its ability to improve zero-shot and few-shot learning, chain-of-thought reasoning, and even steer models towards truthfulness. The study also explores the impact of LLM size and scoring functions on APE’s performance and analyzes its cost-effectiveness. Ultimately, the findings suggest APE provides a significant advancement in controlling and utilizing LLMs’ capabilities.
Executive Summary
This report analyzes the risks and potential benefits associated with general-purpose AI (GPAI) systems. It covers a range of topics, including malfunctions, systemic risks (labor market, global R&D divide, environmental impact, privacy, and copyright infringement), the impact of open-weight models, and technical approaches to risk management. It highlights the rapid advancements in GPAI capabilities, particularly in areas like natural language processing, code generation, and multimodal processing, but also underscores the growing concerns regarding safety, misuse, and unintended consequences. The report calls for rigorous risk identification and assessment, coupled with proactive risk mitigation strategies.
Key Definitions
- General-Purpose AI (GPAI) Model/System: Capable of performing or being adapted to perform a wide variety of tasks. Adaptation includes fine-tuning, prompt engineering, and integration into broader systems.
-
A n AI model is a general-purpose AI model if it can perform, or can be adapted to perform, a wide variety of tasks. If such a model is adapted to primarily perform a narrower set of tasks, it still counts as a general-purpose AI model.
- Modalities: The types of data an AI system can use as input or produce as output (e.g., text, audio, images, video). Understanding modalities is crucial for assessing the potential capabilities and threats of GPAI.
-
The ‘modalities’ of an AI system are the kinds of data that it can usefully receive as input and produce as output.
- Open-Weight Models: AI models with weights made publicly available for download. The report notes that how a model is released to the public is a crucial factor in evaluating the risks it poses.
Key Themes and Important Ideas
- Rapid Capability Growth: GPAI models are rapidly improving in areas such as answering PhD-level
science questions, generating code, and processing multiple modalities. This growth is linked to
increased compute power and inference-time computation.
- The report includes a graph demonstrating the improved accuracy of GPAI models on the GPQA (Grade School Quality Assessment) benchmark, which tests the ability to answer Ph.D.-level science questions.
-
General-purpose AI models have markedly improved at answering Ph.D.-level science questions.
- Cost Efficiency Improvements: The report highlights the increasing cost efficiency of using general-purpose language models, measured by the number of words generated per dollar.
- Risk Assessment and Management: A central theme is the need for robust risk assessment and
management frameworks for GPAI. This includes:
- Risk Identification: Identifying potential hazards during system development, before deployment, and after deployment.
- Risk Evaluation: Determining whether the risk and its magnitude are acceptable or tolerable.
- Risk Mitigation: Implementing appropriate risk-reducing controls and countermeasures.
- Risk Governance: Connecting risk management to organizational strategy and objectives to provide transparency and accountability.
-
Assessing general-purpose AI systems for hazards is an integral part of risk management.
- Risks from Malfunctions: The report identifies reliability issues, bias, and loss of control as key risks arising from malfunctions.
- Systemic Risks: The report highlights systemic risks:
- Labor market risks: Potential displacement or transformation of jobs.
- Global AI R&D divide: Uneven distribution of AI development and resources, potentially exacerbating existing inequalities.
- Market concentration and single points of failure: Dominance of a few companies in the AI market, creating vulnerabilities.
- Environmental Risks: Significant environmental impact due to energy consumption and water usage,
including GHG emissions from hardware manufacturing and data centers.
-
In addition to GHG emissions due to energy use, general-purpose AI has other environmental impacts due to the physical systems and structures required for its development and use, which are even less well understood.
-
Water consumption is another emerging area of environmental risk from general-purpose AI.
-
- Privacy Risks: Threats to privacy due to the collection, processing, and potential misuse of
personal data. The report highlights the need for data minimization.
-
Data minimisation: The practice of collecting and retaining only the data that is directly necessary for a specific purpose, and deleting it once that purpose is fulfilled.
- Copyright Infringement: Risks associated with training AI models on copyrighted material.
-
- Impact of Open-Weight Models: The report emphasizes the trade-offs associated with open-weight
GPAI models, noting both the potential for rapid global impact and the benefits of broader
scrutiny for identifying and mitigating flaws.
-
How an AI model is released to the public is an important factor in evaluating the risks it poses. There is a spectrum of model release options, from fully closed to fully open, all of which involve trade-offs between risks and benefits.
-
A risk factor for open-weight models is that there is no practical way to roll back access if it is later discovered that a model has faults or capabilities that enable malicious use.
-
However, a benefit of openly releasing model weights and other model components such as code and training data is that it also allows a much greater and more diverse number of practitioners to discover flaws, which can improve understanding of risks and possible mitigations.
-
- Data Scarcity: The report acknowledges potential data scarcity challenges, particularly with the
rapid growth of AI. It discusses potential solutions such as multimodal data, synthetic data, and
data from robotics.
-
The degree of data scarcity is specific to the domain and actor. In some domains data gathering can be substantially scaled up, such as in general-purpose robotics, where systems gather data during deployment
-
- Autonomous Vulnerability Discovery: GPAI systems are becoming increasingly capable of autonomously
finding cyber vulnerabilities. This raises concerns about potential misuse but also offers
opportunities for improving cybersecurity.
-
General-purpose AI systems have significantly improved at finding cyber vulnerabilities autonomously
-
- Dual-Use Concerns: The report notes that AI models are becoming more capable at dual-use tasks,
including those related to biological and chemical weapons.
-
AI models have recently become more capable at dual-use tasks and biological and chemical weapons tasks
-
- “Jailbreaking” and Adversarial Robustness: The report discusses the vulnerability of GPAI systems
to “jailbreaks,” which can bypass safeguards and induce them to comply with harmful requests. It
highlights various jailbreaking strategies and the difficulty of anticipating them during model
development.
-
Users of general-purpose AI systems can often bypass their safeguards with ‘jailbreaks’ that induce them to comply with harmful requests.
-
Technical Approaches to Risk Management
The report mentions a range of techniques for risk management, including:
- Safety by Design
- Scenario analysis and planning
- Audits
- Risk assessment and impact assessment
- ‘Safety of the Intended Function’ (SOTIF)
Key Challenges
- The large scope of potential risks
- Limitations of benchmarking techniques
- Lack of full access to systems for evaluation
- Difficulty of assessing downstream societal impacts
Conclusion
The “International AI Safety Report 2025” paints a picture of rapid progress in GPAI, coupled with significant and multifaceted risks. The report underscores the urgency of developing and implementing comprehensive risk management strategies to ensure the responsible and beneficial development and deployment of these powerful technologies. The emphasis on international collaboration, risk assessment methodologies, and addressing systemic risks highlights the complex and multifaceted nature of ensuring AI safety.