Open Source Assets to support Core AI Projects

Tip: Use the search box at the top of this page to find specific content.

IBM Granite Models

Granite models for Language and Code are trained on 12T+ tokens of high-quality, curated data and open sourced with Apache 2.0 license. They are designed for enterprise tasks supporting language (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) and code (generation, explanation, docstring and pseudocode generation, unit test generation, code fixing)
Granite Guardian Models are a robust suite of safeguards designed to detect risks in both prompts and responses, ensuring safe and responsible use with any large language model while promoting responsible AI development.
Granite Embedding Models deliver high-performance sentence-transformer models optimized for retrieval, generating precise embeddings for seamless comparison. Built on ethically sourced datasets and fine-tuned with advanced techniques, these models excel in both academic and enterprise use cases.
Granite Speech is a compact and efficient speech-language model, built on top of IBMs Granite language model and specifically designed for English automatic speech recognition (ASR).
Granite Vision is designed for efficient content extraction from tables, charts, and diagrams, making it a powerful tool for structured data analysis.
Granite Time Series family includes ultra-compact, open-source models optimized for a variety of time-series tasks, starting at under 1 million parameters for maximum efficiency: Tiny Time Mixer (TTM) & Time Series Pulse (TSPulse)

Instruct Lab

Instruct Lab is a methodology (with tool support) to enable collaborative model development. This empowers non-technical experts to teach models about their domains and drives improved model performance at a fraction of the cost of pre-training.

Docling

Docling is an efficient open-source toolkit for AI-driven document conversion from various formats (pdf, docx, xlsx, html, etc.) to outputs in Markdown, HTML, and lossless JSON and integration with LLM frameworks such as LangChain, LlamaIndex, etc.)

Data Prep Kit

Data Prep Kit is an open-source toolkit that contains data preparation recipes for code and language modalities, aimed at fine-tuning, RAG, and instruct-tuning use cases that supports flexible computing from laptop to cluster scale.

Unitxt

Unitxt is an open-source Python library designed for enterprise-ready LLM evaluation, offering thousands of datasets, metrics, and built-in tools for creating custom benchmarks.

AI Atlas Nexus

AI Atlas Nexus (previously called Risk Atlas Nexus) aims to turn abstract AI risk definitions into actionable workflows that streamline AI governance processes. By connecting fragmented resources, AI Atlas Nexus seeks to fill a critical gap in AI governance, enabling stakeholders to build more robust, transparent, and accountable systems.

Eval Assist

Eval Assist simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other large language models by supporting users in iteratively refining evaluation criteria in a web-based user experience.

AI Attribution

AI Attribution toolkit helps users describe how AI contributed to their work. It’s an attempt to create a voluntary, detailed attribution standard to make generative AI more transparent.