Loading…
Thursday June 4, 2026 14:10 - 14:50 EEST
The problem: it is too hard to understand and improve GenAI quality, and yet organizations are moving ahead regardless.For AI engineers it’s hard to:- Increase accuracy due to lack of repeatable & representative testing- Understand reliability: know how, why, or when an agent will failThis leads to poor reliability and accuracy, which:- Increases operational costs and can increase reputational damage- Erodes user trust, reduces customer engagement, and increases churn- Reduces business confidence, slowing down AI adoptionIn this talk I will discuss the limitations of how we are current testing AI agents, and why this means we are not adequately ensuring the safety of agentic AI systems. With non-deterministic systems like Generative/Agentic AI, we need to simulate a large number of inputs (millions) and measure the outputs using judge agents to find the statistical success rate. This a process that is more similar to how we traditionally do load testing rather than the simple functional testing we’re using with AI right now. I will explain how you can instead use tools like AgentCore to create orchestration agents that build other types of agent to make this new type of non-deterministic testing possible.This approach will be for GenAI what traditional automated tests are for deterministic code- Auto generate representative testing material- Orchestrate tests against real AI endpoints- Judge outputs (minimum standards, accuracy quantification)- Improve accuracy and reliability


Key takeaways:
  1. Current functional testing techniques are inadequate for testing agentic/generative AI systems
  2. What does it mean to use LLM as Judge agents? What are input agents?
  3. How can you create an AI testing orchestration pipeline for testing AI agents

Speakers
avatar for Adam Sandman

Adam Sandman

CEO, Inflectra
Adam Sandman was a programmer from the age of 10 and has been working in the IT industry for the past 25 years in areas such as architecture, agile development, testing and project management. Currently Adam is the Founder and CEO of Inflectra Corporation, where he is interested in... Read More →
Thursday June 4, 2026 14:10 - 14:50 EEST
BlackBox Kultuurikatel
  Track

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link