The Venerable Principles of Coupling and Cohesion
Real applications, AI-enabled or not, combine many subsystems, usually including web pages for the user experience (UX), database and/or streaming systems for data retrieval and management, various libraries and modules, and calls to external services. Each of these Components can be tested in isolation and most are deterministic or can be made to behave in a deterministic way for testing.
An AI application adds one or more Generative AI Models invoked through libraries or services. Everything else should be tested in the traditional, deterministic ways. Invocations of the model should be hidden behind an API abstraction that can be replaced at test time with a Test Double. Even for some integration and acceptance tests, use a model test double for tests that aren’t exercising the behavior of the model itself.
But that still leaves the challenge of testing model behaviors, and for some Integration, and Acceptance tests that exercise other parts of the application respond to model queries and results.