Name: I'll Let You Be The Judge? Testing Non-Deterministic Ai Systems
Start: 2026-06-04T14:10:00+0300
End: 2026-06-04T14:50:00+0300

I'll Let You Be The Judge? Testing Non-Deterministic Ai Systems

Thursday June 4, 2026 14:10 - 14:50 EEST

BlackBox

The problem: it is too hard to understand and improve GenAI quality, and yet organizations are moving ahead regardless. For AI engineers it’s hard to:

Increase accuracy due to lack of repeatable & representative testing
Understand reliability: know how, why, or when an agent will fail.

This leads to poor reliability and accuracy, which:

Increases operational costs and can increase reputational damage
Erodes user trust, reduces customer engagement, and increases churn
Reduces business confidence, slowing down AI adoption

In this talk I will discuss the limitations of how we are current testing AI agents, and why this means we are not adequately ensuring the safety of agentic AI systems. With non-deterministic systems like Generative/Agentic AI, we need to simulate a large number of inputs (millions) and measure the outputs using judge agents to find the statistical success rate. This a process that is more similar to how we traditionally do load testing rather than the simple functional testing we’re using with AI right now.

I will explain how you can instead use tools like AgentCore to create orchestration agents that build other types of agent to make this new type of non-deterministic testing possible. This approach will be for GenAI what traditional automated tests are for deterministic code:

Auto generate representative testing material
Orchestrate tests against real AI endpoints
Judge outputs (minimum standards, accuracy quantification)
Improve accuracy and reliability

Key takeaways:

Current functional testing techniques are inadequate for testing agentic/generative AI systems
What does it mean to use LLM as Judge agents? What are input agents?
How can you create an AI testing orchestration pipeline for testing AI agents

Speakers

Adam Sandman

CEO, Inflectra

Adam Sandman was a programmer from the age of 10 and has been working in the IT industry for the past 25 years in areas such as architecture, agile development, testing and project management. Currently Adam is the Founder and CEO of Inflectra Corporation, where he is interested in... Read More →

Adam Sandman I'll let You Be The Judge Testing Non Deterministic AI Systems.pptx pdf

I'll let You Be The Judge Testing Non Deterministic AI Systems Supporting Material.docx pdf

Thursday June 4, 2026 14:10 - 14:50 EEST
BlackBox Kultuurikatel

Track

Difficulty Getting your toes wet, Deep dive

Nordic Testing Days 2026

Adam Sandman

Get help with the event

Nordic Testing Days 2026

Adam Sandman

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event