Link Search Menu Expand Document
AI Alliance Banner
Join Our Initiative   GitHub Repo

SafetyBAT Leaderboard

The Safety BAT Leaderboard on the AI Alliance Hugging Face Community uses BenchBench to rate benchmarks according to their agreement with the defined Aggregate Benchmark, an enhanced representation of many benchmarks that are available.

BenchBench is a useful tool for users with the following needs:

  • You have a new benchmark and you want to see if it agrees or disagrees with other known benchmarks.
  • You are looking for a benchmark to run and use to ensure your trust in a system or model you want to use. BenchBench helps you find efficient alternatives that provide acceptable coverage, but may meet other needs, such as the ability to run the benchmark privately or with less overhead.

The leaderboard shows that agreements are best represented with the BenchBench Score, the relative agreement (Z Score) of each benchmark to the Aggregate benchmark.

Read more about BenchBench in the paper Benchmark Agreement Testing Done Right and the BenchBench repo.

A Guide to Using SafetyBAT

TODO

Working with the SafetyBAT Code

If you are interested in cloning the source code for your own use or contributing to this leaderboard, see the repo’s README.