Link Search Menu Expand Document
AI Alliance Banner
Join Our Initiative   GitHub Repo

Leaderboards

This section describes the leaderboards and related tools that are maintained by this initiative or separately by other AI Alliance members.

The leaderboards provide results from running benchmark suites of the evaluators against various models and AI systems that use them.

The other tools assist software engineers in identifying important risks for their use cases and finding the evaluators and benchmarks that support testing for those risks.

Plans for Leaderboards and Other Tools

Planned leaderboards will include the leading open-source models to serve as evaluation targets and as evaluation judges. Initially, we are focusing on Meta’s Llama family of models and IBM’s Granite family of models, with others to follow.

As we fill in the evaluation taxonomy, we will add corresponding evaluators and benchmarks to the leaderboards, along with search capabilities to find the topics of interest.

Finally, we plan to provide downloadable and deployable configurations of the Evaluation Platform Reference Stack with the selected evaluators for easy and rapid use.

The child pages listed next describe the leaderboards and other tools that are currently available.