Join Our Work Group Visit Our GitHub Repo

Leaderboards

This section describes the leaderboards and related tools that are maintained by this initiative or separately by other AI Alliance members.

The leaderboards provide results from running benchmark suites of the evaluators against various models and AI systems that use them.

The other tools assist software engineers in identifying important risks for their use cases and finding the evaluators and benchmarks that support testing for those risks.

Current Leaderboards

See the following pages for the leaderboards already available:

Plans for Leaderboards and Other Tools

Planned leaderboards will include the leading open-source models to serve as evaluation targets and as evaluation judges. Initially, we are focusing on Meta’s Llama family of models and IBM’s Granite family of models, with others to follow.

As we fill in the evaluation taxonomy, we will add corresponding evaluators and benchmarks to the leaderboards, along with search capabilities to find the topics of interest.

Finally, we plan to provide downloadable and deployable configurations of the Evaluation Platform Reference Stack with the selected evaluators for easy and rapid use.

The child pages listed next describe the leaderboards and other tools that are currently available.

Leaderboards

Current Leaderboards

Plans for Leaderboards and Other Tools

Child Pages