Link Search Menu Expand Document
AI Alliance Banner
Browse the Datasets   Contribute a new Dataset!

Contribute to the Future of AI with Open, Trusted Data!

Join The AI Alliance Open Trusted Data Initiative (OTDI), where our mission is to create a comprehensive, crowd-sourced catalog of openly licensed, provenanced data for AI model training, domain-specific tuning, and related uses.

In our context Trusted Data means the provenance and governance of the data are clear and unambiguous. The metadata about each dataset will also provide clarity about their safety and other concerns, such as the potential presence of hate speech, the filtering used to remove such content, etc.

Why Contribute?

  • Collaborate on AI Innovation: Your data helps build more accurate, fair, and versatile AI models. You can also connect with like-minded data scientists, AI researchers, and industry leaders.
  • Transparency & Trust: Every contribution is transparent, with robust data provenance, governance, and trust mechanisms. Bring your expertise to help us improve all aspects of data management.
  • Tailored Contributions: Support domain-specific model tuning to create open foundation models relevant to your industry or domain of interest.
  • Recognition: Get credited for your data contributions, which help the industry move to a more open, end-to-end development process for AI models and applications, with all the traditional benefits of open source software development.

Why Is Trusted Data Important?

A current challenge in AI is the “murky” provenance of many datasets used for training large language models (LLMs), which raises concerns for model developers and users of the potential for models to output private, confidential, and copyrighted information that might have been part of the training dataset, among other concerns. This is one of the reasons that most models that allow “open” use rarely include publication of their training dataset and the full source code for all the filtering and transformation steps used to create that dataset, from initial acquisition to its final form before training. At best, open models limit themselves to descriptions in general terms of the data sources and methods used.

OTDI aims to address these concerns with an industry wide effort to gather and process data fully in the open, allowing model developers and users to have full confidence in the provenance and governance of the data they use.

Next Steps

Ready to contribute a dataset? To get started, first review our requirements and prepare a dataset card, then contribute your dataset! Finally, see How We Process Datasets for an overview of the filtering and analysis steps we perform.

More Information

Tip: Use the search box at the top of this page to find specific content.

Authors The AI Alliance Open Trusted Data Work Group
History V0.0.3, 2024-09-06
  V0.0.2, 2024-09-01
  V0.0.1, 2024-09-01

Child Pages