Nordic Testing Days 2026: Full Schedule

09:00 EEST

Ai-Powered E2E Mobile Testing With Appium And Mobile Mcp

Wednesday June 3, 2026 09:00 - 17:00 EEST

Writing mobile tests with Appium can be challenging. Many of us have dealt with unstable selectors, differences between Android and iOS, or slow progress. What if we could use artificial intelligence with Appium? And what exactly is Mobile MCP?

Tutorial Overview

This tutorial is for anyone interested in starting mobile testing with a modern approach. We’ll use Python throughout. Together, we will build an end-to-end framework where AI supports us from the beginning. We will see how tools like Mobile MCP can find elements without needing static IDs or XPath.

We’ll focus on building full end-to-end scenarios. We’ll test whether self-healing really works or is just hype. We’ll also look at where AI can speed up our work, and where we still need the Appium Inspector.

Prerequisites & Setup for Attendees:
Please complete the guideline beforehand: https://github.com/paciadawid/NTD2026_mobile_ai/blob/master/workshop-setup.md

Agenda

AI in Appium: We’ll talk about the main challenges in classic mobile testing and what AI can help solve.
E2E framework architecture: How to set up a project for AI-powered end-to-end testing, going beyond the basic Page Pattern.
What is "Mobile MCP"? We’ll give a practical introduction to the tool and show how it works with Appium.
AI vs. Appium Inspector: We’ll see how AI can find elements in an app as they change.
Building an E2E Scenario: Step by step, we’ll create a full test path, like logging in, searching, and adding to the cart.
Intelligent assertions and self-healing: We’ll see how AI helps check the app’s state and what it does when the UI changes.
Results analysis: We’ll look at how AI can help us figure out why an end-to-end test failed.
Summary: AI in mobile - revolution or evolution? We’ll finish with a Q&A session.

Speakers

Dawid Pacia

QA Consultant, PathcingIT

QA and Test Automation Manager as well as mentor and trainer. Tech freak following all the newest technologies (and implementing them on his own). Fan of the Agile approach to project management and products. Supporting companies in transformations toward better quality. Actively... Read More →

Wednesday June 3, 2026 09:00 - 17:00 EEST
Puupakusaal Kultuurikatel

Tutorial

Difficulty Like a fish in the sea

09:00 EEST

Beyond Assert(True): Hands-On Testing For LLMs And AI Agents

Wednesday June 3, 2026 09:00 - 17:00 EEST

Terrassi

Traditional software is deterministic: the same input yields the same output. Large Language Models (LLMs) and AI Agents have shattered this rule, introducing an inherently probabilistic paradigm. How do we ensure quality when the ground truth is shifting? This tutorial bridges the gap between traditional QA and AI evaluation. We will move beyond simple prompt testing into validating complex multi-agent systems. Participants will learn to build test oracles that evaluate intent and semantics rather than exact matches, evolving the QA role from a code verifier to an evaluation framework architect.

Target Audience: QA Engineers, SDETs, and Developers working with or transitioning to Generative AI systems.

Learning Objectives
By the end of this workshop, participants will be able to:

Deconstruct AI Architectures: Identify specific testable layers such as Shell (API/UI), Orchestration (Context/Tools), and Inference Core (Probabilistic).
Build Modern Test Oracles: Implement aggregated and property-based oracles using Python to handle non-deterministic outputs.
Validate Multi-Agent Systems: Apply a four-level framework to test communication, delegation, and error propagation between AI agents.
Execute AI Red Teaming: Identify vulnerabilities such as prompt injection, hallucinations, and safety bypasses.
Automate Quality Metrics: Integrate BERTScore and RAG-specific metrics such as Faithfulness and Relevance into CI/CD pipelines.

Prerequisites for Attendees:

Basic knowledge of Python and API fundamentals. Participants must bring a laptop with VS Code and Python installed. Alternatively, also Cursor, Claude Code, Codex, OpenCode, or any equivalent tool is suitable. In any case, make sure that the chosen agent is already installed, configured and ready for the session.

Workshop Outline

The Paradigm Shift
- Theory: Deterministic vs. probabilistic testing. Agent taxonomy.
- Practice: Environment setup and executing your first fuzzy test.
Oracles & Orchestration
- Theory: Atomic vs. aggregated oracles. Testing the orchestration layer.
- Practice: Writing scripts to validate JSON schemas and output consistency.
Semantic Evaluation
- Theory: RAG metrics such as Faithfulness and Relevance. Introduction to BERTScore.
- Practice: Building an LLM-as-a-Judge evaluator to grade complex answers.
Multi-Agent Testing
- Theory: Inter-agent communication and task delegation loops.
- Practice: Debugging a workflow where a Travel Agent delegates to a Finance Agent.
Red Teaming & Security
- Theory: Prompt injection, mutation testing, and metamorphic testing.
- Practice: Simulated attack scenarios, bypassing safety filters, and implementing guardrail fixes.
QA Strategy & Governance
- Theory: Human-in-the-loop workflows and production monitoring.
- Practice: Designing a full-scale QA strategy for a real-world GenAI product.

Speakers

Tiago Gomes

Lead QA Consultant, Thoughtworks

Tiago Gomes is a passionate technology leader and Lead Consultant at Thoughtworks, dedicated to advancing the industry through hands-on project work and mentorship. With expertise in Software Testing and Project Management, he collaborates with clients to understand their challenges... Read More →

Daniel Carvalho

Senior QA Engineer, Hostfully

Daniel Carvalho is a Senior QA Engineer focused on building scalable, data driven quality systems through automation and modern testing strategies. He specializes in Risk Based Testing, Critical Flow Testing, API testing, and quality metrics that enable faster, better informed decisions... Read More →

Wednesday June 3, 2026 09:00 - 17:00 EEST
Terrassi Kultuurikatel

Tutorial

Difficulty Like a fish in the sea

09:00 EEST

KEYNOTE: The Irreplaceable 40%

Thursday June 4, 2026 09:00 - 10:00 EEST

BlackBox

Artificial Intelligence is increasingly used to write test code. Today it is estimated that 60% of new test code is written by AI tools and this number will only continue to rise. Surprisingly, however, the testing is now the critical component needed is to ‘test’ the 40% that AI is unable to test itself, that is where the future of testers.

That 40% is not busy work. This is not your typical motivational speech dressed up as another AI testing keynote. What I will talk about in the keynote is the three critical capabilities that are becoming more valuable not less, and the underlying reason for this is that all of them depend on something that current models lack: the human intent behind engineering quality software for other humans to use.

Engineer the Context Understanding: While AI can greatly amplify the effectiveness of testing, the quality of results is only as good as the input you give it. In 2026, the testers who will rise to the top will not be the ones who came up with the cutest prompt for the AI to answer. Rather, they will be those who can successfully engineer the context of the problem for the AI to solve. We will look at real-world best practices for testing with AI, and I will share an AI Assurance playbook for context engineering that will immediately raise the quality of your AI-infused testing.
Review with Heuristic Judgment: AI generated test suites look good - until they fail to test what really matters silently. Green pipelines are the most dangerous artefact in organisations today. No one asks: what are we not testing? This capability helps organisations audit test suites generated by AI with a sense of heuristic judgement. I can explain how to design world class AI augmented tests like an experienced navigator reading a map to identify the blank spaces where organisation specific risks reside.
Orchestrate Trust: Humans trust humans they still need decide when the machine is wrong. That is not a technical skill - it is a quality leadership act. This capability describes the shifting role of quality engineers to trust AI Assurance oracles. This capability explores the new role of orchestrating AI Assurance that trust across teams, tools, and stakeholders, and answers the question: “Who is really responsible for quality?”

By the end of this keynote, you will have an assessment of your skills in planning, architecting, designing and leading AI governance, a practical plan of action to implement in the real world, and the assurance that you will be “ready” to become part of the “Irreplaceable 40%”.

Key Takeaways:

Context Engineering is our New Superpower: AI-augmented tests are only as valuable as the context a human provides. Attendees will leave with a repeatable framework for curating system, user, and business context that transforms AI output from "technically correct" into "intent driven tests" - a skill that compounds in value as the important of testing of AI infused systems improve.

Heuristic Auditing Catches the Real Truth: A passing automated test regression suite is not proof of quality; it is proof that the tests you wrote passed. Attendees will gain practical heuristic-led auditing patterns to interrogate value-driven testing, identify dangerous blind spots, and ask the questions that thinking machines are structurally incapable of asking themselves.

Trust Orchestration is the Quality Engineer's Next Career: The highest-value skill in an AI-augmented team is not technical testing - it is the ability to calibrate, communicate, and own confidence decisions across people, tools, and stakeholders. Attendees will understand how to position themselves as the trust architect oracles in their organisation increasingly needs, turning a perceived threat into their most durable career advantage.

Speakers

Jonathon Wright

Chief AI Officer, Testers.AI

Jonathon Wright is a strategic thought leader and distinguished technology evangelist. He specializes in emerging technologies, artificial intelligence, and automation, and has more than 25 years of international commercial experience within global organizations. As the Chief AI Officer... Read More →

Jonathon Wright Keynote AIConfidenceEngineering pdf

Thursday June 4, 2026 09:00 - 10:00 EEST
BlackBox Kultuurikatel

Keynote

Difficulty Like a fish in the sea

11:50 EEST

How To Survive In The Ai Jungle: Rethinking Test Strategies For An Ai Era

Thursday June 4, 2026 11:50 - 12:30 EEST

BlackBox

Intro:Artificial Intelligence challenges almost every assumption the testing discipline is built on. Traditional testing depends on fixed inputs and predictable logic, but AI systems are adaptive, probabilistic, and context-dependent. That means our classical test cases are no longer stable reference points.

In this 20-minute talk, Nicole van Gijn explores what testing looks like when your system learns, reasons, and occasionally hallucinates. She introduces the AI Quality Grid, a structured framework co-developed with John Kronenberg, that helps define quality attributes, risks, and validation strategies for AI applications. The session bridges theory and practice through concrete examples from a real AI test project, showing how LLM-evals and risk-based thinking can be combined to test prompt robustness, output consistency, and bias control within modern CI/CD pipelines.

Attendees will walk away with a lightweight but actionable structure for AI quality assessment and a new mindset: understanding quality not as a checklist, but as an intelligent, adaptive discipline.

Key Takeaways:

AI systems are rapidly entering production pipelines, yet testing methods lag behind.
Testers and QA leads urgently need practical models to evaluate non-deterministic outputs.
The AI Quality Grid offers a bridge between AI model evaluation (LLM-evals) and classical test strategy, providing testers with new tools and thinking patterns to stay relevant in the AI era.

Speakers

Nicole van Gijn

Thought leader AI Quality, QA company

Nicole van Gijn is Thought leader AI Quality, where she researches how to enhance software quality and test automation for AI applications. She developed the AI Quality Grid, a framework for testing AI-driven systems, and explores how classical QA principles evolve towards risk-based... Read More →

Nicole van Gijn How To Survive In The AI Jungle.pptx pdf

Thursday June 4, 2026 11:50 - 12:30 EEST
BlackBox Kultuurikatel

Track

Difficulty Getting your toes wet, Like a fish in the sea

13:30 EEST

Demystifying Continuous Deployment: From Weekly Tension To Daily Confidence

Thursday June 4, 2026 13:30 - 14:10 EEST

BlackBox

Deploying to production shouldn't require a meeting, three approvals, and a prayer. Yet most teams treat every release like launching a rocket - mission control on standby, everyone watching the countdown, but no one wanting to press the red button.

At Sokos Hotels, our web booking system handles thousands of reservations daily, processing millions monthly across 50 hotels all over Finland and Estonia. One critical bug means lost revenue; one outage means thousands of unhappy guests. We were trapped in weekly releases, manual verification, and the kind of Thursday tension that put everyone in the mission control room. We asked ourselves if Continuous Deployment is just a myth.

In less than a year, we broke the cycle. We went from weekly manual releases to deploying seven times per day with higher confidence than ever. The results? 4.5/5.0 customer effort score, 40% higher conversion rates and 4.3+/5.0 overall team happiness. The secret? A testing strategy that made deployment boring for the last 2 years.

Join my talk to find out how we did it!

Your takeaway: A practical, battle-tested roadmap for testing-enabled Continuous Deployment. You'll leave knowing which tests to automate first, how to build confidence without sacrificing speed, and how to prove to skeptics that this isn't just another risky experiment. Real patterns, real failures, real results - ready to implement Monday morning.

Who should attend: QA engineers, test automation engineers, developers, engineering managers, and DevOps practitioners who believe testing should accelerate delivery, not slow it down.

Key Takeaways:

The three signs your team is not ready for Continuous Deployment, and the technical enabler that breaks the testing tension.
How we shifted from “QA signs off on releases” to "Quality is built-in", and why the cultural change was harder than the technical one.
The valuable failure lessons we learned along the way and what it taught us about green pipeline.

Speakers

Quan Dao

Sr. Delivery Lead, SOK

Quan Dao is a strategic Delivery Leader and international speaker with a decade of experience in quality-driven software delivery. Currently at SOK, Finland's largest retailer, he works at the intersection of delivery strategy, technical practice, and people leadership - helping organisations... Read More →

Minh Dao Quan Demystify Continuous Deployment.pptx pdf

Thursday June 4, 2026 13:30 - 14:10 EEST
BlackBox Kultuurikatel

Track

Difficulty Getting your toes wet, Like a fish in the sea

11:50 EEST

Running A Thousand End-To-End Cypress Tests Every Day

Friday June 5, 2026 11:50 - 12:30 EEST

BlackBox

In this talk, I show how we run a lot of full end-to-end Cypress web application tests every day. In addition to running the full data set, we do separate feature test runs based on test tags. We also allow everyone from all teams to trigger the tests right from GitHub Actions UI. This lets every group quickly test their feature before merging into the main branch.

For pull requests, we employ source code analysis based on data test IDs to run the affected tests first for quicker feedback. The software automation team uses the flake test information to chase the sources of the underlying errors to minimize noise and make every passing test run give us confidence in the released code, and every failing test run useful to quickly diagnose the real underlying issue.

The presentation covers test writing, test organization, selecting tests to run based on the source code changes, running tests in different resolutions. I also look into making the tests faster by employing data creation and caching, as well as using API calls to bypass the user interface in some places. Finally, making the tests robust and flake-free and triaging the failed runs is an ongoing activity for the automation team.

Key takeaways:

How to run 1000 of end-to-end tests quickly
Which tests to run on a pull request
How AI is helping us pick tests to run

Speakers

Gleb Bahmutov

Sr Director of Engineering, Mercari US

Gleb Bahmutov is a JavaScript ninja, image processing expert, and software quality fanatic. During the day Gleb is making the engineers more productive at Mercari US in his position as the Senior Director of Engineering. At night he is fighting software bugs and blogs about it at... Read More →

Gleb 1000 e2e pdf

Friday June 5, 2026 11:50 - 12:30 EEST
BlackBox Kultuurikatel

Track

Difficulty Getting your toes wet, Like a fish in the sea

13:30 EEST

ViTO (Visual Test Oracle): How to use GenAI to slash your code and Test Maintenance By 50%

Friday June 5, 2026 13:30 - 14:10 EEST

D-Saal

Problem Context

Brittle selectors: We spend hours fixing fragile XPaths and CSS selectors just to verify if a button is visible or a chart is correct
Release changes: Automation code that is stable "now" suddenly becomes flaky after the release. The reason is the ever-changing platform. And automation is not always able to cope with it
Code volume: Code analysis in our company showed us that assertion logic is typically five times (5x) larger in code size than action logic, consuming up to three months of dedicated maintenance effort every year

Solution
In this session, I introduce ViTO (Visual Testing Oracle), a production-deployed framework that leverages multimodal Generative AI (GenAI). ViTO "sees" the application exactly like a human does. The best part is that, in the end, it's just another block of code that can be embedded inside any framework.

Summary of what's in the talk
I will share:

The logic and algorithm of how we used GenAI to decouple verification from the underlying code resulted in a 50% reduction in our assertion codebase.
How we replaced thousands of lines of brittle verification logic with resilient, prompt-driven visual oracles that can handle complex data visualisations and unseen UI faults with zero extra effort. If you are tired of your tests breaking because a div changed, it's time to shift from structural selectors to a visual AI oracle.
The lessons learned from our initiative, and above all, where NOT to use GenAI
Access to the boilerplate code that you can implement within your repo
If time permits, a demo of the framework in action. If short on time, the link to the boilerplate is provided in the slides :)

Who is this for?

QA Architects, Senior SDETs, Automation Engineers, Manual testers looking to transition to GenAI-based testing;
Managers/architects looking for a language-agnostic framework to build GenAI-based assertions
Anyone who wishes to know where to and where NOT to use GenAI in testing
QA professionals looking for a starting point (boilerplate) code to embed GenAI in their automation

Key takeaways:

In-code GenAI: How to implement GenAI directly "in-code" using any programming language
Prompt Engineering for Testers: How to write resilient "Assertion Prompts" that replace complex conditional code and handle visual regression automatically.
Real-World ROI: Evidence-based results from a production environment, showing a 50% reduction in code maintenance and expanded coverage for rich UI components.
Deterministic AI: Practical strategies to control GenAI hallucinations using "concentrated screenshots"
A sneak peek into what's coming in the future in GenAI for test automation

Speakers

Rahul Singh

Staff Software Engineer - AI Solution, Blue Yonder

Rahul is a techy with 16 years of experience - 10 yrs with testing and automation, and gradually moved to software development. With a strong focus on problem-solving and innovation, his focus has been on "tangible" solutions. Most lately, his works involve "meaningful" implementation... Read More →

Rahul Singh ViTO (Visual Test Oracle).pptx pdf

Friday June 5, 2026 13:30 - 14:10 EEST
D-Saal Kultuurikatel

Track

Difficulty Getting your toes wet, Like a fish in the sea

13:30 EEST

When Life Gives You Lemons… Are You Counting Them Or Making Lemonade?

Friday June 5, 2026 13:30 - 14:10 EEST

BlackBox

Teams often rely on test cases executed, bugs reported, and pass rates to measure success. These numbers might look impressive, but do they truly reflect software quality? Vanity metrics can mislead teams, encourage the wrong behaviours, and create a false sense of progress.

This talk introduces a 7-step framework to move beyond superficial KPIs and focus on metrics that drive real value. Inspired by analytical approaches in competitive sports, this model helps teams make better decisions, align testing efforts with business goals, and ensure that data supports meaningful improvements.

Key takeaways:

The risks of vanity metrics and how they can mislead decision-making.
How to design KPIs that focus on value, not just activity.
A practical framework to ensure testing metrics drive meaningful change

Speakers

Chris Armstrong

Manager, Developer Relations, SmartBear

Chris (he/him) is a strategic and context-informed quality engineering leader with nearly two decades of experience helping organisations improve their quality practices. Specialising in strategic test leadership, Chris excels at cross-functional leadership, working across QA, Development... Read More →

Chris Armstrong When Life Gives You Lemons… Are You Counting Them or Making Lemonade.pptx pdf

Friday June 5, 2026 13:30 - 14:10 EEST
BlackBox Kultuurikatel

Track

Difficulty Getting your toes wet, Like a fish in the sea

14:10 EEST

A Missing Input Validation May Be Used for Denial of Service Attacks

Friday June 5, 2026 14:10 - 14:50 EEST

D-Saal

The security impact of missing input validation is usually underestimated.

The presentation explains and gives examples of how missing logical limits may lead to denial of service attacks on the application that seems quite secure - no injection or execution vulnerabilities needed.

As the presenter is a co-lead of the OWASP ASVS project, related security requirements are also pointed out.

Not a single word about AI.

Speakers

Elar Lang

Lecturer and Penetration tester, Clarified Security

Elar Lang is a web application security specialist and enthusiast who has been working for more than 14 years in different aspects of web application security. A full-time security tester, training architect, and web application security developer educator (close to 3000 hours of... Read More →

Friday June 5, 2026 14:10 - 14:50 EEST
D-Saal Kultuurikatel

Track

Difficulty Like a fish in the sea

14:10 EEST

Lessons Learned From Ai-Powered Visual Reasoning Feedback

Friday June 5, 2026 14:10 - 14:50 EEST

BlackBox

Visual testing is supposed to protect QA teams from the familiar “it looks wrong” bug, yet traditional pixel-diff approaches only show that something changed, not whether that change actually matters. As modern interfaces grow more dynamic and design systems become more complex, teams need smarter ways to detect meaningful visual regressions.

This talk presents a practical approach to automated visual bug detection using multimodal LLMs. Drawing on a real-world implementation, it shows how AI models from providers such as OpenAI, Anthropic, and Google can be orchestrated to analyze screenshots and identify issues that pixel-based tools often cannot interpret on their own. These include layout breaks, missing elements, accessibility concerns, color contrast problems, and platform-specific guideline violations.

The session explores how AI-driven visual analysis can move beyond pixel-perfect comparison toward semantic understanding, helping teams distinguish intentional UI changes from genuine defects. It also addresses one of the biggest challenges in visual testing at scale: false positives, demonstrating how agent-based review systems can reduce noise while still surfacing critical issues.

Attendees will leave with practical ideas for using multimodal AI to strengthen visual testing workflows and make automated UI validation more accurate, scalable, and useful.

Key Takeaways:

How to evolve from “pixel diffs” to impact-based automated visual feedback
Patterns that turn image feedback into structured results (what changed, where, severity, why it matters)
Tips for integrating automated LLM-powered visual feedback into existing automated UI test frameworks

Speakers

Risko Ruus

Principal QA Engineer, Rush Street Interactive

I am a software quality enthusiast with over 20 years of experience in various companies and software projects. I enjoy both developing software and testing it (including test automation). Example applications I have worked on include Nokia smartphones, Skype, and mobile betting... Read More →

Risko Ruus Lessons Learned from AI Powered Visual Reasoning Feedback (repaired).pptx pdf

Friday June 5, 2026 14:10 - 14:50 EEST
BlackBox Kultuurikatel

Track

Difficulty Like a fish in the sea

09:00 EEST

09:00 EEST

09:00 EEST

11:50 EEST

13:30 EEST

11:50 EEST

13:30 EEST

13:30 EEST

14:10 EEST

14:10 EEST

Get help with the event