Open Models and Data Projects
The Open Models and Data Projects address key needs for customized, domain-specific models and data sets, while addressing concerns for sovereignty and governance.
Project Tapestry
Project Tapestry is a global initiative to build and tune foundation with full and flexible support for sovereignty concerns.
| Links | Description |
|---|---|
|
Project Tapestry |
|
| The AI Alliance launched Project Tapestry to build a collaborative foundation for open and sovereign AI. Project Tapestry will be an open-source platform designed to enable globally federated development of frontier open models while preserving sovereignty, local control, and long-term independence. | |
Projects for Open Trusted Data and Tooling
Good datasets are essential for building good models and applications. The AI Alliance is cataloging datasets, and in some cases building them, that have clear licenses for open use, backed by unambiguous provenance and governance constraints.
| Links | Description |
|---|---|
|
The Open, Trusted Data Initiative |
|
Open data has clear license for use, across a wide range of topic areas, with clear provenance and governance. OTDI seeks to clarify the criteria for openness and catalog the world’s datasets that meet the criteria. Our projects:
|
|
|
SYNTH Initiative |
|
| The SYNTH Initiative aims to address the critical gap in open-source AI development by creating a cutting-edge, open-source data corpus for training sovereign AI models and advanced AI agents. This involves curating permissively licensed, high-quality multimodal and multilingual datasets, with a focus on underrepresented languages, and generating synthetic data specifically designed to enhance frontier-level reasoning capabilities in these languages. The ultimate mission is to enable global access to advanced AI reasoning by fostering an inclusive data ecosystem that supports the full training pipeline of sophisticated models and agents. | |
| Docling | |
| Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem. Docling is a key tool for the project Parsing PDFs to Build AI Datasets for Science, discussed above. (Principal developer: IBM Research) | |
1 The icon indicates an Alliance core project.
Open Models and Tooling for New Domains and Modalities
The AI Alliance is building new models for many domains and modalities at the intersection of research and engineering. Our projects include models for industrial AI, molecular discovery, geospatial, and time series applications.
| Links | Description |
|---|---|
| Open Models | |
Several AI Alliance work groups are collaborating on the development of domain-specific models:
|
|
| TerraTorch | |
| TerraTorch is a library based on PyTorch Lightning and the TorchGeo domain library for geospatial data. (Principal developer: IBM Research) | |
| GEO-bench | |
| GEO-Bench is a General Earth Observation benchmark for evaluating the performance of large pre-trained models on geospatial data. (Principal developer: ServiceNow) | |
