Link Search Menu Expand Document

Testing Agents

So far in this guide, we have focused on testing AI-enabled applications in a generic sense and not focusing on specific concerns for testing Agents and other popular application patterns (like RAG).

This section contains several chapters that dive into the design and building of agents, in the Building Agents chapter, and how to test and evaluate them, in the Evaluating Agents chapter.

Agents offer broad and increasingly-sophisticated behaviors, working in tandem with LLMs and traditional software services and systems. Agents introduce new challenges for testing. Effective design patterns, development techniques, and tool kits for agents are also rapidly evolving. We won’t attempt to cover all these topics in depth here, but we will focus on a few promising techniques and tools for building and testing agents. We include a list of additional resources and tools for further investigation in the Tools for Agent Development and Testing chapter.

About Agents

Agents are a broad class of software components with behaviors that are complementary to the capabilities that models themselves provide. They range from relatively simple to very sophisticated.

In contrast, the original, simple healthcare ChatBot application just adds context information to user queries to create prompts and it uses custom handling of the responses. It relies heavily on an LLM’s ability to classify the kind of query, e.g., a prescription refill request, and to extract some useful details from the query, such as the prescription in question, if the patient mentions it. For most such use cases, instead of returning the generated text to the user, the application presents a predefined message corresponding to the classification returned, so we have better control over potentially “suboptimal” generated responses. However, it can only handle simple queries and responses, not perform any workflows, like manage appointments.

This is where agents come in. They enable complex workflows, research and report preparation, planning, reasoning, and even autonomous action on your behalf, when allowed to do so. Work is on-going to make agents self-learning, so they can adapt to evolving or new uses without special programming or training. The next chapter will introduce an agent-based ChatBot implementation, called ChatBotAgent,

An example of a more sophisticated agent is the AI Alliance project Deep Research Agent for Applications, which demonstrates an important agent design pattern, Deep Research Agent, (see, for example, here and here), with several example applications. (It is built on LastMile AI’s agent framework, MCP Agent.) An example of an even more advanced agent that is very hot at this time is OpenClaw.

Agents are arguably the most rapidly evolving area of the AI ecosystem right now, in part because they are helping to fully realize the potential of AI to transform work and life activities. While we believe the concepts discussed in these chapters will remain relevant for a long time, the specific techniques, tools, and services mentioned will likely change.

What’s Next?

Proceed to the first chapter Building Agents, followed by Evaluating Agents, and finally Tools for Agent Development and Testing chapter.

Then review the highlights in each chapter and proceed to the Lessons from Systems Testing.