Authors | The AI Alliance Open Trusted Data Work Group |
Last Update | V0.3.2, 2025-06-03 |
News
May 21, 2025 | We have added a “static” catalog of Hugging Face datasets with licenses, organized by the most common keywords. This is a temporary implementation while we work on a more feature-rich, interactive catalog |
May 6, 2025 | We are actively soliciting datasets covering diverse languages, domains, use cases, etc. Please see our contributing page and the catalog of available datasets. |
February 11, 2025 | OTDI announced at the AI Action Summit in Paris. Added an EPFL dataset. |
January 31, 2025 | Added Data for Good at Meta datasets. |
January 23, 2025 | The initiative Steering Committee is established. |
December 11, 2024 | Added ServiceNow datasets. |
November 20, 2024 | BrightQuery joins the AI Alliance and the Open Trusted Data Initiative: LinkedIn announcement. |
November 4, 2024 | PleIAs joins the AI Alliance and the Open Trusted Data Initiative: LinkedIn announcement. |
October 15, 2024 | Common Crawl Foundation joins the AI Alliance and the Open Trusted Data Initiative. |
Join the Open Trusted Data Initiative!
Want to contribute datasets to our catalog, go here!
Want to join one this initiative’s projects? You can see our backlog here. To join us, either Send email to data@thealliance.ai or visit the work group webpage and fill in the form. This CONTRIBUTING page has some specific information for developer contributions.
Thank you for your interest!
How to Contact Us
Send email to data@thealliance.ai.
If you notice issues with this website or any of our other assets, considering posting issues in the project GitHub repo.
About the Open Trusted Data Initiative
Steering Committee
The Steering Committee represents diverse industry experience in open data and AI. The committee guides the strategy of OTDI, oversees the technical projects, and helps expand awareness of our work.
Here are the committee members, in alphabetical order (by first name):
- Anastasia Stasenko - Pleias
- Christopher Nguyen - Aitomatic
- Dean Wampler - IBM Research
- Greg Lindahl - Common Crawl Foundation
- Jose Plehn-Dujowich - BrightQuery
- Sean Hughes - ServiceNow
- Yacine Jernite - Hugging Face
Maintainers
The steering committee members contribute to the content published here. For the technical team that contributes to the data pipelines, etc., see the GitHub repo’s contributor list.
Contributing AI Alliance Member Organizations
These Alliance member organizations are contributing to OTDI in various ways. In alphabetical order:
Please join us! We welcome organizations and individuals as collaborators.
About The AI Alliance
The Open Trusted Data Initiative (OTDI) is a core project managed by the Open Trusted Data Work Group in The AI Alliance. The AI Alliance is a global collaboration of startups, enterprises, academic and other research institutions interested in advancing the state of the art, the availability, and the safety of AI technology and uses.
The AI Alliance’s core projects seek to address substantial cross-community challenges and are an opportunity for contributors to collaborate, build, and make an impact on the future of AI. Core Projects are managed directly by the AI Alliance and governed as described below. You can find a list of the affiliated projects, which are Alliance member projects that we promote, but they are not directly managed by the Alliance.
Other AI Alliance Information
- More About the AI Alliance
- Follow us on LinkedIn and Bluesky
About This Documentation
This documentation about OTDI is built with GitHub Pages, which uses Jekyll to serve the website. We use the Just the Docs Jekyll theme.
How to Contribute to This Documentation
We welcome your contributions to this documentation itself. The sources are in the docs
directory of this GitHub repo. Please post issues or contribute changes as pull requests. Also, notice that every page has Edit this page on GitHub links, making it easy to go straight to the source of a page to make edits and submit a PR! This is the best way to help us fix typos and make single-page edits.
The repo’s GITHUB_PAGES file explains more details for testing the documentation website locally and for creating more extensive changes as PRs.