AI Safety, Governance, and Education
Collaborate on the necessary enablers of successful AI applications.
In order for the objectives of the Open Agent Hub and the Open Data and Model Foundry to be achieved, fundamental requirements must be met for safety, governance, and the expertise required to use AI technologies effectively.
AI Safety encompasses classic cybersecurity, as well as AI-specific concerns, such as suppression of undesirable content and compliance with regulations and social norms. A more general term is trustworthiness, which adds concerns about ensuring accuracy (i.e., minimizing hallucinations) and meeting the specific requirements for application use cases, etc. Enterprises won’t deploy AI applications into production scenarios if they don’t trust them to behave as expected.
Governance is an aspect of trustworthiness, specifically the assurances that all end-to-end processes used to create all AI application components are secure, licensed for use, etc. AI models are created with data; they are mostly data themselves. Hence, models, like data, need to be governed.
Finally, Education addresses the needs where organizations struggle to learn all the things they need to know in order to use AI safely and effectively. Not only has AI introduced new tools and techniques to software application development, it has fundamentally altered some of the ways software works, for example, introducing stochastic behaviors as core aspects of application features, where previously deterministic behaviors were the norm. Most AI Alliance projects have dual missions, not only to innovate and create, but to educate.
The following projects address these concerns.
| Links | Description |
|---|---|
|
The AI Trust and Safety User Guide |
|
| An introduction to trust and safety concepts from diverse experts, followed by recommendations for how to meet your application's needs. Start here if you are new to trust and safety, then leverage the projects discussed next to implement what you need. | |
|
Testing Generative AI Applications |
|
| Are you an enterprise developer? How should you test AI applications? You know how to write deterministic tests for your "pre-AI" applications. What should you do when you add generative AI models, which aren't deterministic? This project adapts existing evaluation techniques for the "last mile" of AI evaluation; verifying that an AI application correctly implements its requirements and use cases, going beyond the general concerns of evaluation for safety, security, etc. We are building nontrivial, reusable examples and instructional materials, so you can use these techniques effectively in combination with the traditional tools you already know. This project is part of the Trust and Safety Evaluation Initiative (TSEI). (It was previously called Achieving Confidence in Enterprise AI Applications.) | |
| DoomArena | |
|
AI agents are becoming increasingly powerful and ubiquitous. They now interact with users, tools, web pages, and databases—each of which introduces potential attack vectors for malicious actors. As a result, the security of AI agents has become a critical concern. DoomArena provides a modular, configurable framework that enables the simulation of realistic and evolving security threats against AI agents. It helps researchers and developers explore vulnerabilities, test defenses, and improve the security of AI systems. The DoomArena architecture comprises several key components that work together to create a flexible, powerful security testing environment for AI agents:
DoomArena offers several advanced capabilities that make it a powerful and flexible framework for security testing of AI agents:
|
|
|
Evaluation Is for Everyone |
|
| Evaluation Is for Everyone addresses two problems: 1) many AI application builders don't know what they should do to ensure trust and safety, and 2) it should be as easy as possible to add trust and safety capabilities to AI applications. Many trust and safety evaluation suites are available that can be executed on the Evaluation Reference Stack. We are making it as easy as possible for AI application developers to find and deploy the evaluations they need. See also the companion Testing Generative AI Applications project. This project is part of the Trust and Safety Evaluation Initiative (TSEI). | |
|
Evaluation Reference Stack |
|
| The companion projects Testing Generative AI Applications and Evaluation Is for Everyone require a runtime stack that is flexible and easy to deploy and manage. This project is collating popular tools for writing and running evaluations into easy-to-consume packages. This project is part of the Trust and Safety Evaluation Initiative (TSEI). | |
| unitxt | |
| Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking. (Principal developer: IBM Research) | |
