Join Our Initiative GitHub Repo
User Personae and Their Needs
User Personae for Evaluation
There are a wide range of stakeholders in the AI space who can benefit from this initiative:
- Model Builders: Who need to evaluate their models against desired criteria.
- Independent Software Vendors: Companies providing AI-as-a-Service, including evaluations for safety.
- AI Application Developers: Builders of AI-enabled applications who need to choose the most effective (or cost-effective) models for their needs. They also need to perform appropriate safety evaluation of their solutions.
- Researchers: Exploring new algorithms and datasets for evaluation.
Shared Needs for All Users
Collectively, these users would benefit from the following capabilities:
- The ability to easily share pre-executed benchmark results, to compare them with other benchmarks available, and optionally to focus on domain-specific benchmarks, e.g., for industries such as healthcare or finance.
- Share datasets and evaluators in a reusable manner.
- Easily execute evaluations on select models, in public leaderboards or private deployments.
- Publish evaluation results in a leaderboard.
- Share knowhow and best practices in an actionable way.
- Adopt a reference stack of tools that facilitates the above capabilities.