Link Search Menu Expand Document
AI Alliance Banner
Join Our Initiative   GitHub Repo

Trust and Safety Evaluations Initiative

Authors The AI Alliance Trust and Safety Work Group
Last Update V0.3.1, 2024-12-12

Welcome to the The AI Alliance initiative for Trust and Safety Evaluations.

Tips:

  1. Use the search box at the top of this page to find specific content.
  2. Capitalized, italicized terms link to a glossary of terms.

Much like other software, generative AI (“GenAI”) Models and the AI Systems that use them need to be trusted and useful to their users.

Evaluation aims to provide the evidence for gaining users’ trust in models and systems. More specifically, evaluation refers to the capability of measuring and quantifying how a model or system responds to inputs. Are the responses within acceptable bounds, for example free of hate speech and Hallucinations, are they useful to users, cost-effective, etc.?

There are many organizations working on evaluations for safety, broadly defined, and other kinds of measurements, as well as Benchmarks that aggregate some evaluations and Leaderboards that let you see how some models and systems do against benchmarks, without having to execute these benchmarks yourself.

The Trust and Safety Evaluations Initiative addresses several under-served needs:

  1. While very good Taxonomies of evaluation in the areas of risk and harms have emerged, there are other areas of interest where a standard taxonomy, with corresponding evaluations, would be useful. See taxonomy.
  2. Evaluators that implement evaluations in the taxonomy are needed. Some areas are well-covered, while others have no available evaluators. These evaluators can be aggregated into benchmarks. See evaluators.
  3. Leaderboards are needed that provide unique, user-configurable views on different benchmark combinations, which help users focus on the benchmarks most relevant to their needs. See leaderboards.
  4. Users need a reference stack of industry-standard OSS tools for evaluation, especially at Inference time. See Evaluation Platform Reference Stack.

This website provides the documentation for this initiative, with links to other resources, including code and leaderboards, as they become available.

Are you interested in contributing? If so, please see the contributing page for information on how you can get involved.

This site is organized into the following sections:

Additional links:

Version History

Version Date
V0.3.1 2024-12-12
V0.3.0 2024-12-05
V0.2.0 2024-11-15
V0.1.0 2024-10-12