User Personae and Their Needs

User Personae for Evaluation

There are a wide range of stakeholders in the AI space who can benefit from this initiative:

Model Builders: Who need to evaluate their models against desired criteria.
Independent Software Vendors: Companies providing AI-as-a-Service, including evaluations for safety.
AI Application Developers: Builders of AI-enabled applications who need to choose the most effective (or cost-effective) models for their needs. They also need to perform appropriate safety evaluation of their solutions.
Researchers: Exploring new algorithms and datasets for evaluation.

Collectively, these users would benefit from the following capabilities:

The ability to easily share pre-executed benchmark results, to compare them with other benchmarks available, and optionally to focus on domain-specific benchmarks, e.g., for industries such as healthcare or finance.
Share datasets and evaluators in a reusable manner.
Easily execute evaluations on select models, in public leaderboards or private deployments.
Publish evaluation results in a leaderboard.
Share knowhow and best practices in an actionable way.
Adopt a reference stack of tools that facilitates the above capabilities.