Link Search Menu Expand Document
AI Alliance Banner
Join Our Initiative   GitHub Repo

Trust and Safety Evaluations Initiative

Authors The AI Alliance Trust and Safety Work Group (see About Us)
Last Update V0.4.1, 2025-01-30

Welcome to the The AI Alliance initiative for Trust and Safety Evaluations.

Unlike traditional software systems that rely on prescribed specifications and application code, AI systems based on machine learning models depend on training data to map inputs to outputs. Consequently, these systems are inherently non-deterministic and may produce errors due to variability in the training data or the probabilistic nature of the underlying algorithms. To evaluate such systems, benchmarks are commonly used to address user concerns, such as accuracy and bias. However, since benchmarks can be manipulated over time to achieve favorable results, it is essential to establish a flexible evaluation framework that supports rapid updates to evaluation criteria and benchmark selection. Given the critical role of testing and evaluation in deploying AI systems, there is a pressing need for a consistent methodology and robust tool support for these activities.

In the context of generative AI, evaluation serves to provide evidence that fosters user trust in models and systems. Specifically, it involves measuring and quantifying how a model or system responds to inputs. Are the responses within acceptable bounds—free from hate speech, hallucinations, or other harmful outputs? Are they useful, cost-effective, and reliable?

Within the AI Alliance’s Trust and Safety Focus Area, a primary objective is to promote an industry-accepted taxonomy and an evaluation framework that meet the needs of both the research community, which drives innovation, and AI solution developers, who create AI-powered systems.

The Trust and Safety Evaluation Initiative (TSEI) aims to establish a hub connecting creators and consumers of evaluations, starting with safety, in a sustainable value-chain. Its mission is to incentivize collaboration across these distinct communities and to foster emerging standards and open technologies for creating, evaluating, and deploying evaluations.

This initiative is anchored around the following key functional capabilities:

  1. Shared, industry-wide taxonomy: many organizations have worked on taxonomies of evaluation, usually focused on specific areas of interest. TSEI seeks to gather these taxonomies, expand them where appropriate, and create a unified taxonomy across the spectrum of evaluation concerns, for example covering risk, safety, performance, security, etc. Ideally, the unified taxonomy will be embraced by the community as the standard definition, which will help unify disparate R&D efforts, for both builders of models and evaluations/evaluators, as well as users of them.

  2. Open-source reference evaluation stack: while there are numerous general-purpose evaluation frameworks in the open-source community, most new evaluations are implemented in proprietary, POC-level code. TSEI will identify and endorse evaluation frameworks and libraries that are already emerging as industry standards, and that address the needs of both creators and consumers of evaluations, and will work to enhance it with supporting tools, to reduce both the effort to implement new evaluations, and the effort to use them in real-world production environments. It should establish a common “programming model” for quickly and efficiently creating evaluations in an open, scalable, flexible, and reusable manner.

  3. Curated catalog of evaluations: finding the right evaluation or benchmark for a given task is not trivial. Most evaluations are published as academic or industrial papers, with datasets and implementation spread across repositories such as HuggingFace or GitHub. TSEI aims to create a curated catalog of production-ready evaluations in an AI-augmented process. The catalog will include mapping of evaluations to the common taxonomy, as well as various functional and non-functional metrics to help consumers make the best choice for their needs.

  4. Operational Hub: a cloud-based, running environment hosting the evaluation stack, with onr or more leaderboards showing various benchmarks, and a UI for browsing and filtering the evaluation catalog, providing consumers with convenient ways to download packaged, deployable content, compatible with major clour and on-prem architectures.

This website provides the documentation for this initiative, with links to other resources, including code and leaderboards, as they become available.

Are you interested in contributing? If so, please see the contributing page for information on how you can get involved.


This site is organized into the following sections:

Additional links:

Version History

Version Date
V0.4.1 2025-01-30
V0.4.0 2025-01-18
V0.3.1 2024-12-12
V0.3.0 2024-12-05
V0.2.0 2024-11-15
V0.1.0 2024-10-12