Join Our Work Group Visit Our GitHub Repo

References

Table of contents

References
1. Trust and Safety Frameworks, Principles and Tools
2. Other Resources

Trust and Safety Frameworks, Principles and Tools

An alphabetical list of links to AI trust and safety information from governments, corporations, universities, and non-profit institutions, organized by those organizations. Some of the references here were discussed more fully under Exploring AI Trust and Safety .

ACM Europe Technology Policy Committee

Comments in Response to European Commission Call for Evidence Survey on “Artificial Intelligence - Implementing Regulation Establishing a Scientific Panel of Independent Experts” PDF was prepared by the ACM Europe Technology Policy Committee (November 15, 2024). It is one of many ACM Public Policy Products.

Adobe

The AI inflection point provides Adobe’s recommendations for responsible AI in organizations (published December 2024).

Alignment Forum

https://www.alignmentforum.org/

A forum for researchers to discuss all facets of AI model and system alignment.

Berryville Institute of Machine Learning

An Architectural Risk Analysis of Large Language Models

A comprehensive assessment of risks in LLM-based systems.

EleutherAI

Their definition of alignment

Discussed in What We Mean by Trust and Safety.

lm-evaluation-harness

A popular open-source framework for performing evaluations, including for safety.

European Union

EU AI Act

The first act to regulate AI in the EU. It uses a risk-based approach to regulating AI, including a unique approach that specifies different rules for more powerful generative AI models. Like GDPR regulations for data, the EU AI Act is expected to impact AI practices far beyond the EU’s borders.

Google

Responsible Generative AI Toolkit

Google’s developer toolkit for responsible AI.

Securing the AI Software Supply Chain

How Google secures the assets and resources used to develop and use models, datasets, and applications that use them.

Hugging Face

evaluate

Another popular evaluation framework.

IBM

Unitxt

A library for portable evaluator definitions. Integrated with many safety projects and tools, including lm-evaluation-harness.

Responsible AI

IBM’s description of responsible AI, as informed by IBM product offerings and services. In particular, see the Responsible Use Guide (PDF)

Kepler

Sustainability benchmarks, e.g., for estimating carbon consumption. An example of an evaluator that isn’t focused on safety in our definition of the term.

International AI Safety Report 2025

The International AI Safety Report 2025 is a report on the state of advanced AI capabilities and risks. Written by 100 AI experts including representatives nominated by 33 countries and intergovernmental organizations, it is the latest annual update to this report that has been published for several years. The 2025 report was published January 29, 2025, just before the Artificial Intelligence Action Summit, February 10 and 11, 2025, in Paris.

Mitre

MITRE Enterprise ATT&CK ontology

A globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.

Common Weakness Enumeration

The industry standard database of known vulnerabilities.

MLCommons

MLCommons AI Safety

The work group at ML Commons that defined an influential Taxonomy of Harms (v0.5) as part of its benchmarks project, See also their Arxiv paper.

Mozilla Foundation

Accelerating Progress Toward Trustworthy AI

Mozilla’s approachable guide to AI trust and safety. It makes the argument that open innovation for AI is the best way to ensure safety and wide accessibility.

Organization for Economic Co-operation and Development (OECD)

Resources on Artificial Intelligence

Catalogue of Tools & Metrics for Trustworthy AI

Various useful resources on AI safety and accessibility.

Pacific Northwest Laboratory

Interactive OODA Processes for Operational Joint Human- Machine Decision Making by Blaha and Leslie.

Explores machine vs. human approaches to OODA (Observe, Orient, Decide, Act)

ServiceNow

Responsible AI Guidelines: A Practical Handbook for Human-Centered AI

ServiceNow’s guide to responsible AI, reflecting their experiences providing AI technologies.

Stanford University, Center for Research on Foundation Models (CRFM)

Holistic Evaluation of Language Models (HELM)

An influential platform and tools for general evaluation of AI models and systems.

Stanford University, Human-centered Artificial Intelligence (HAI)

Artificial Intelligence Index, The AI Index Report 2024: Measuring trends in AI

Describes a wide range of trends in A. In particular, it discusses how there are no current standards for responsible AI. All model and systems builders use different evaluations.

United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)

NIST guidance Artificial Intelligence Risk Management Framework (AI RMF 1.0) is discussed in depth here. It is their recommendations for assessing and managing AI Risk.

NIST’s Responsibilities Under the October 30, 2023 Executive Order

NIST’s clarification of its roles and responsibilities under the executive order (next reference), including a Request for Information (RFI) to which the AI Alliance responded.

United States Government, Executive Branch

Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence

The current US Government administration’s view on AI safety.

United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability

Political Declaration on Responsible Military Use of Artificial Intelligence and Autonomy

A State Department “one-page” statement of reponsible use of AI by governments, including militaries.

University of California, Berkeley

AI Risk-Management Standards Profile for General-Purpose AI Systems (GPAIS) and Foundation Models

Guidance on risk assessment and management.

Chatbot Arena A popular, crowd-sourced platform for gauging the performance of chatbots.

University of Illinois at Chicago (UIUC) Secure Learning Lab

AI Secure, Decoding Trust

A Comprehensive Assessment of Trustworthiness in GPT Models.

University of Notre Dame, et al.

Trusted AI (TAI) Frameworks Project

A consortium of universities and United States Department of Defense (DoD) agencies researching the requirements for trustworthy AI. See also the ND Crane GitHub repository.

Liu, Haochen, Yiqi Wang, Wenqi Fan, Xiaorui Liu, Yaxin Li, Shaili Jain, Yunhao Liu, Anil K Jain, and Jiliang Tang, “Trustworthy AI: A Computational Perspective”, ACM Trans. Intell. Syst. Technol., June, 2022.

Other Resources

AI Leaderboards Are No Longer Useful

link

An informative blog post about the difficulties of relying on leaderboards to choose the best performing models or systems, because they often ignore total cost, rely on benchmarks that have limited scope, and other challenges.

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

link

A comprehensive survey of current challenges for LLMs.

OODA loop

link

Constantly performing the loop - Observe, Orient, Decide, Act. Originally developed by United States Air Force Colonel John Boyd for combat operations, it has been applied in other areas, like industrial applications, project assessment, etc.

Prompt Engineering

link

Wikipedia overview of techniques to manipulate prompts in order to achieve more desirable responses.

What is retrieval-augmented generation?

link

One of many introductions to the popular RAG pattern for improving alignment, especially with data that is newer than the last training or tuning run for the underlying models.

Your AI Product Needs Evals

link

An engineer’s guide to various techniques for ensuring alignment of your AI system.

References

Trust and Safety Frameworks, Principles and Tools

ACM Europe Technology Policy Committee

Adobe

Alignment Forum

Berryville Institute of Machine Learning

EleutherAI

European Union

Google

Hugging Face

IBM

International AI Safety Report 2025

Meta

Mitre

MLCommons

Mozilla Foundation

Organization for Economic Co-operation and Development (OECD)

Pacific Northwest Laboratory

ServiceNow

Stanford University, Center for Research on Foundation Models (CRFM)

Stanford University, Human-centered Artificial Intelligence (HAI)

United States Government, Department of Commerce, National Institute of Standards and Technology (NIST)

United States Government, Executive Branch

United States Government, Department of State, Bureau of Arms Control, Deterrence, and Stability

University of California, Berkeley

University of Illinois at Chicago (UIUC) Secure Learning Lab

University of Notre Dame, et al.

Other Resources

AI Leaderboards Are No Longer Useful

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

OODA loop

Prompt Engineering

What is retrieval-augmented generation?

Your AI Product Needs Evals