Link Search Menu Expand Document

Testing Strategies and Techniques

After discussing architecture and design considerations, we turn to testing strategies and techniques that can be used to create reliable, repeatable tests for generative AI applications.

For educational purposes, we demonstrate techniques using relatively simple tools that are easy for you to try. While they are designed to be suitable for research and development processes, we also describe more sophisticated tools that may be necessary for more “advanced” uses, larger teams, etc. These additional tools are described in sections with titles that begin with Other Tools… near the end of each chapter. Also, note that many startups and consulting organizations now offer proprietary tools and services to aid developer testing, but we won’t cover those options.

The end of each chapter has an Experiments to Try section for further exploration.

NOTE: Using a Generative AI Model can mean it is managed by the application itself, behind library APIs, or it is accessed as a remote service, such as ChatGPT, or through a protocol like MCP. It can be part of more advanced design patterns like Agents and RAG. Furthermore, evaluating just models is not sufficient since these other tools can modify prompts and responses. So, just as classic Unit Tests, Integration Tests, and Acceptance Tests cover individual Units to Components that aggregate them, it is really necessary for our AI tests to cover not just model prompts and responses, but units and components they are part of. Nevertheless, for simplicity, we will often work with models directly.

What’s Next?

Start with Unit Benchmarks. Refer to the Glossary regularly for definitions of terms. See the References for more information.