Link Search Menu Expand Document

Testing Strategies and Techniques

After discussing architecture and design considerations, we turn to testing strategies and techniques that can be used to create reliable, repeatable tests for generative AI applications.

For educational purposes, we demonstrate techniques using relatively simple tools that are easy for you to try. While they are designed to be suitable for your research and development processes, we also describe more sophisticated tools that are available for more “advanced” uses.

The end of each chapter has an Experiments to Try section for further exploration.

NOTE: Using a Generative AI Model can mean it is managed by the application itself, behind library APIs, or it is accessed as a remote service, such as ChatGPT, or through a protocol like MCP. It can be part of more advanced design patterns like Agents and RAG. Furthermore, evaluating just models is not sufficient since these other tools can modify prompts and responses. So, just as classic Unit Tests, Integration Tests, and Acceptance Tests cover individual Units to Components that aggregate them, it is really necessary for our AI tests to cover not just model prompts and responses, but units and components they are part of. Nevertheless, for simplicity, we will often work with models directly.

What’s Next?

Start with Unit Benchmarks. Refer to the Glossary regularly for definitions of terms. See the References for more information.