Link Search Menu Expand Document

BlueBench Leaderboard

The BlueBench Leaderboard on the IBM Research Hugging Face Community is an easy-to-use suite of benchmarks for different domains.

From the description:

BlueBench is an open-source benchmark developed by domain experts to represent required needs of Enterprise users. It is constructed using state-of-the-art benchmarking methodologies to ensure validity, robustness, and efficiency by utilizing unitxt’s abilities for dynamic and flexible text processing. As a dynamic and evolving benchmark, BlueBench currently encompasses diverse domains such as legal, finance, customer support, and news. It also evaluates a range of capabilities, including RAG, pro-social behavior, summarization, and ChatBot performance, with additional tasks and domains to be integrated over time.

Using BlueBench for Your Own Evaluations

BlueBench is particularly easy to run locally, with just a few commands. See the Reproducibility section at the end of the About tab on the leaderboard page.