AI Alliance GitHub Organization Repos AI Alliance Events

The Open Agent Hub Projects

Collaborate, experiment, and build production-ready, open-source agents.

The Open Agent Hub is a collaborative community of open-source AI projects and domain-specific work groups that seek to make AI Agents successful in the real world through fast experimentation and distillation of learning into reusable reference patterns and example implementations.

Focusing on domain-specific projects surfaces unique challenges that often generalize to other domains, too.

Our work groups include engineers, AI researchers, and subject matter experts from industry-leading organizations. Here is some of our work so far:

Industrial AI: E.g., domain-specific models for Semiconductor process agents (SemiKong - paper) and Marine Navigation (Llamarine - paper).
Deep Research in Finance, Medical, etc.: Deep Research Agent for Applications implements the widely-applicable agent pattern, deep research, with example applications for finance, medicine, and ArXiv exploration.
Expert Knowledge Graphs: Semiont, a Wiki-like knowledge base supporting graph retrieval, where humans and agents co-create Knowledge, and Bartlebot, a demonstration AI Agent for the legal domain integrated with Slack.
Geospatial: GeoBench, an Earth observation benchmark for evaluating LLM performance on geospatial data, and TerraTorch, a Python toolkit for fine-tuning Geospatial Foundation Models (GFMs).
Chemistry and Materials: Foundation models for molecular analysis.

We welcome your feedback and help, including suggestions for new projects and domain-specific use cases of importance to you.

The Model Context Protocol (MCP) from Anthropic is quickly becoming an industry standard for communications between models, tool, and data repositories. A competing project with more emphasis on the unique requirements for agents is the Agent2Agent (A2A) protocol. The AI Alliance seeks to advance the application of these protocols and related tools to foster the development of robust distributed AI systems.

Links	Description
MCP (and Beyond) in the Enterprise: A User Guide ¹
repo dashboard issues discussions	Model Context Protocol (MCP) has enormous potential to accelerate AI adoption in enterprises. Alternative protocols and complementary tools are also emerging rapidly. This "living" user guide features chapters written by experts on various aspects of deploying, managing, and using these tools successfully in enterprise settings. It contains the first several chapters with many more coming soon. (Contributions are welcome!)
Context Forge
repo dashboard issues discussions	Context Forge is an AI application management suite, with support for protocols like MCP, A2A, REST, etc. It serves as a central management point for tools, resources, and prompts, as well as observability, security, and access control. (Principal developer: IBM)
Deep Research Agent for Applications
repo dashboard issues discussions	The Deep Research Agent for Applications demonstrates MCP in action for an important, common design pattern, Deep Research Agents. The first example application shows how a financial analyst can use a deep research agent to find, aggregate, and analyze information about a public company (or other potential investment). The second example explores recent medical research on diseases and pharmaceuticals. The third example supports finding and summarizing recent research papers posted to ArXiv. There are many other applications possible. The project is a collaboration of LastMile AI and IBM. It is based on LastMile's MCP Agent. See the recent Alliance blog post on LastMile's lessons learned developing the orchestration feature in MCP Agent for deep research and related use cases. Highly informative!
NLIP Project
ECMA TC-56 GitHub org dashboard other documents	The NLIP project developed an open-source protocol for intelligent agents to communicate with each other and with humans using natural language. NLIP is designed to perform the role of a meta-protocol that allows agents from other ecosystems to communicate with one another including interfaces with other protocols such as A2A, ACP, AGNTCY, MCP, NANDA, etc. It is now an ECMA standard, TC-56 NLIP, Natural Language Interaction Protocol. The project is developing reference implementations of the protocol and end-points. The MCP (and Beyond) in the Enterprise: A User Guide, discussed above has a chapter on NLIP.

¹ The icon indicates an Alliance core project.

Agent Development Tools

See also Deep Research Agent for Applications, discussed above.

CUBE - Common Unified Benchmark Environment

See also other safety and evaluation projects in AI Safety, Governance, and Education.

Links	Description
CUBE - Common Unified Benchmark Environment: Standard, Harness, and Registry
Standard: repo issues Harness: repo issues Registry: repo issues	Common Unified Benchmark Environment meets a common necessity, to standardize benchmark wrapping so the community can integrate otherwise-incompatible benchmarks uniformly and use them everywhere. The three projects include: Standard: The standards. Harness: An open-source framework and research initiative for building and evaluating AI agents. Registry: A community-maintained index of benchmarks that implement the CUBE standard. Any CUBE-compliant evaluation platform or training harness can discover and run registered benchmarks without custom integration. (Principal developer: ServiceNow)

Links

Description

CUBE - Common Unified Benchmark Environment: Standard, Harness, and Registry

Standard:
- repo
- issues
Harness:
- repo
- issues
Registry:
- repo
- issues

Common Unified Benchmark Environment meets a common necessity, to standardize benchmark wrapping so the community can integrate otherwise-incompatible benchmarks uniformly and use them everywhere. The three projects include:

Standard: The standards.
Harness: An open-source framework and research initiative for building and evaluating AI agents.
Registry: A community-maintained index of benchmarks that implement the CUBE standard. Any CUBE-compliant evaluation platform or training harness can discover and run registered benchmarks without custom integration.

(Principal developer: ServiceNow)

Other Agent Tools and Applications

Links	Description
Testing Generative AI Agent Applications
repo dashboard issues discussions	Are you an enterprise developer? How should you test AI applications? You know how to write deterministic tests for your "pre-AI" applications. What should you do when you add generative AI models, which aren't deterministic? This project adapts existing evaluation techniques for the "last mile" of AI evaluation; verifying that an AI application correctly implements its requirements and use cases, going beyond the general concerns of evaluation for safety, security, etc. We are building nontrivial, reusable examples and instructional materials, so you can use these techniques effectively in combination with the traditional tools you already know. (This project is also discussed under the AI Safety, Governance, and Education projects.)
CUGA - Configurable Generalist Agent
repo issues discussions	CUGA is an open-source generalist agent framework from IBM Research, purpose-built for enterprise automation. Designed for developers, CUGA combines and improves the best of foundational agentic patterns such as ReAct, CodeAct, and Planner-Executor into a modular architecture enabling trustworthy, policy-aware, and composable automation across web interfaces, APIs, and custom enterprise systems.
Dana — The Agent-Native Evolution of AI Development
repo issues blog post	Dana is based on the question, “What if your agents could learn, adapt, and improve itself in production—without you?” Dana bridges the gap between AI coding assistance and autonomous agents through agent-native programming: native `agent`primitives, context-aware `reason()` calls that adapt output types automatically, self-improving pipelines with compositional `\|` (“pipe”) operators, and functions that evolve through POET feedback loops (an automated prompt improvement technique). (Principal developer: Aitomatic)
Gofannon
repo issues discussions	A repository of functions consumable by other agent frameworks.
AllyCat
repo dashboard issues discussions	(Beginner friendly!) Get started with a simple and fun end-to-end RAG application that scrapes your website so you can ask it questions.
The Living Guide to Applying AI
repo issues discussions	Tips from experts on using AI for various applications, including popular design patterns. (Contributions are welcome!)

Knowledge Graphs for Agent Knowledge Bases

A set of projects for building knowledge bases using knowledge graphs.

Links	Description
Semiont
repo dashboard issues discussions	Wiki-like knowledge base supporting graph retrieval, where humans and agents co-create Knowledge. Includes MCP an server. See also the companion projects, Proscenium, Lapidarist, and Bartlebot, next.
Proscenium
repo dashboard issues discussions	Collaborative, asynchronous human/agent interactions.
Lapidarist
repo dashboard issues discussions	Document enrichment and knowledge structure (e.g., knowledge graph) extraction and resolution.
Bartlebot
repo issues discussions	Bartlebot is a demonstration of an AI Agent for the legal domain with a Slack integration. It is in early development.

Llama Stack and Llama Stack Agents

The Llama Stack project standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem, integrates with other open-source tools and managed services, and provides APIs for inference, evaluation, agents, MCP, and deployment requirements like observability. It is designed to support both on-premise and cloud deployments. The ecosystem provides many example applications to help developers build and deploy AI applications quickly and effectively.

AI Alliance members are contributing directly to Llama Stack development, as well as building example applications that illustrate its use in various enterprise scenarios. The llama-stack-examples project has two initial example applications, described in the table below. The first app is a simple getting-started chatbot that shows you the basics of creating an app with Llama Stack and how to run it. The second app (in development) is a deep research application, a popular class of AI applications, which will demonstrate Llama Stack support for technologies like agents and MCP. Other examples under consideration will be chosen to cover other common application patterns seen in several industries. Please join us!.

Links	Description
Llama Stack
documentation repo issues discussions	The Llama Stack project itself. See also the Llama Stack Python Client.
Llama Stack Example Apps
repo issues	A growing suite of example applications for Llama Stack that demonstrate how to build applications that use the RAG pattern and agents. See also the Llama Stack Demos for OpenShift and Kubernetes.
AI Alliance Llama Stack Example Apps
repo issues discussions	A growing suite of example applications for Llama Stack that demonstrate various stack features and common application patterns: A getting-started chatbot app, which shows how to build and deploy Llama Stack applications. It includes two different UI options and inference with an ollama-hosted Llama 3 model. Jupyter Notebooks that demonstrate several APIs, like the new Responses API (blog post). A deep research app (under development), which illustrates an emerging, common application pattern for AI. The user asks for detailed information about a topic, for example the market performance and financials for a publicly-traded company, agents find relevant data from diverse sources, and finally an LLM digests the information retrieved and prepares a report. This example will demonstrate Llama Stack support for agent-based application development, including the use of protocols like MCP.
CCVec - Common Crawl to Vector Stores
repo issues	Search, analyze, and index Common Crawl data into vector stores for RAG applications, with three interfaces: CLI, Python library, and an MCP server. (Principal developers: Common Crawl Foundation and Meta)
Red Hat Lightspeed
repo docs	An end-to-end system management tool that predicts risks across Red Hat platforms, recommends actions, and tracks costs. Red Hat Lightspeed uses AI-powered package recommendations and planning capabilities to provide targeted guidance on increasing your systems’ day-to-day efficiency. (Principal developer: Red Hat)

Deployment and Scaling

Deploying and scaling AI systems is critical for cost-effective use of AI. There is the growing diversity of hardware accelerators for AI, not only for servers, but for edge devices, too. Developers want the ability to write AI applications that efficiently and transparently scale across different deployment scenarios, from PoCs and single-node deployments on development laptops and edge devices, up to large-scale clustered deployments supporting many users.

Links	Description
The AI Accelerator Software Ecosystem Guide
repo issues discussions	A guide to the most common AI accelerators and the software stacks they use to integrate with tools you know, like PyTorch. (Contributions are welcome!)