At ATxSummit 2026 last week, Singapore’s Minister for Digital Development and Information, Josephine Teo, spoke about the strategic importance of the AI Verify Foundation’s latest assurance case studies as part of Singapore’s broader push toward responsible and large-scale AI adoption.
We are proud that Savos by impress.ai was featured among them.
This is not our first engagement with the AI Verify ecosystem. impress.ai was also part of the original AI compendium of real-world AI applications. As AI evolved into the generative and agentic AI age, participating in this latest initiative was a natural next step.
The case study, conducted in collaboration with Asenion and the AI Verify Foundation, focused on evaluating and stress testing core agentic AI capabilities within Savos across areas such as conversational interviewing, resume scoring, bias evaluation, adversarial testing, governance, and enterprise AI assurance.
For us, the value of the exercise went far beyond inclusion in the report.
It provided an opportunity to subject an agentic AI hiring system to the kind of scrutiny that enterprise customers increasingly expect before deploying AI into high-impact workflows.

What Is Savos?

Savos is impress.ai’s agentic AI hiring ally designed to function as an intelligence layer above the hiring stack.
Unlike traditional recruitment platforms that primarily act as systems of record or workflow automation tools, Savos was built specifically for the emerging era of agentic AI — where systems are expected to reason, adapt contextually, conduct conversations, generate evaluations, and actively support decision-making across enterprise hiring workflows.
The case study specifically evaluated three major capabilities within Savos:
Resume Scoring
This module uses Retrieval-Augmented Generation (RAG) and semantic matching techniques to evaluate candidate-job fit by analyzing resumes against role requirements, skills, qualifications, and hiring criteria extracted from job descriptions.
ScaleScreen
ScaleScreen is Savos’ agentic conversational interviewing capability. Given a job description, the skills that need to be evaluated, the industry context, and the role requirements, it first creates an intelligent interview plan designed to efficiently explore a candidate’s profile.
It then conducts a dynamic conversation with the candidate, continuously adapting follow-up questions based on each response. Rather than moving through a predefined questionnaire, the system decides what to probe further, what evidence has already been collected, and which areas require deeper evaluation.
This ability to plan, reason, and adapt in real time is what makes ScaleScreen fundamentally different from traditional screening workflows, and one of the most interesting capabilities evaluated during the assurance process.
Talent Lens
Talent Lens enables recruiters to apply customizable AI-driven evaluations to resumes, interview transcripts, and candidate data in order to assess qualitative hiring dimensions such as communication skills, cultural alignment, and role-specific competencies.
Together, these systems represent a fundamentally different approach to hiring AI — one where AI is not simply automating workflows, but actively participating in complex decision-support processes.
That level of autonomy creates significant opportunities for enterprises. It also creates significant responsibility.
Why We Participated
Savos was built to solve a practical problem.
Hiring teams are expected to evaluate more candidates, make faster decisions, and deliver a better candidate experience, often without additional resources. Advances in agentic AI have created new opportunities to support these goals in ways that were not possible even a few years ago.
But greater autonomy also raises new questions around reliability, governance, fairness, and accountability.
These questions matter to us because many of our customers operate in environments where hiring decisions are subject to significant scrutiny. Government agencies, financial institutions, healthcare organizations, and large enterprises cannot simply assume AI systems are behaving as intended.
They need confidence.
Participating in this initiative gave us an opportunity to evaluate Savos through that lens and understand where both the technology and the industry still have work to do.
Today’s enterprises want to understand:
- how systems behave under stress,
- how outputs are evaluated,
- how risks are monitored,
- how decisions can be traced,
- how bias is identified,
- how failures are handled,
- and whether governance exists beyond marketing language.
That was one of the biggest reasons we chose to participate in this initiative with Asenion and the AI Verify Foundation.
The Industry Still Underestimates How Difficult Agentic AI Actually Is
One of the easiest mistakes in AI today is assuming that intelligence and reliability scale together. They don’t. In many cases, the opposite happens. As systems become more autonomous, flexible, and context-aware, their operational unpredictability also increases.
And nowhere was that more obvious than during the assurance process.
One of the major capabilities evaluated inside Savos was ScaleScreen: our conversational interviewing system that dynamically interacts with candidates through free-flow conversations instead of static question trees.
On the surface, conversational AI interviews sound straightforward. Until you actually try testing them rigorously. Traditional software systems are relatively deterministic. Given the same input, they generally produce the same output repeatedly. Agentic conversational systems do not behave that way.
- Context changes behavior.
- Conversation history changes behavior.
- Prompt phrasing changes behavior.
- Candidate responses change behavior.
- Multi-turn interactions create cascading unpredictability.
And suddenly, you are no longer evaluating a simple API response. You are evaluating evolving system behavior across dynamic conversational states. That distinction became painfully clear during the testing process.
The Assurance Process Exposed a Different Kind of Enterprise Readiness Gap
One of the most valuable outcomes of the project was realizing that many AI systems are built to perform, but not necessarily built to be tested.
The distinction sounds subtle, but it becomes important very quickly once AI moves into enterprise environments.
During the assurance process, one of the challenges both teams encountered was evaluating stateful agentic workflows consistently and at scale. Capabilities such as ScaleScreen are designed to adapt dynamically to each candidate, taking into account conversation history, responses, role requirements, and contextual information. This adaptability is precisely what makes agentic systems powerful.
It also makes them significantly harder to evaluate.
Several of the implementation challenges highlighted in the case study stemmed from this reality. Features that worked well in production workflows were not always designed with third-party assurance testing in mind. Creating repeatable testing environments, restarting conversations consistently, generating comparable evidence, and evaluating outcomes across large volumes of interactions required substantial effort from both teams.
As enterprises adopt increasingly autonomous AI systems, they need confidence that the system can be independently tested, audited, challenged, governed, and improved over time. Enterprise readiness increasingly includes assurance readiness.
The project reinforced a lesson that will influence how we continue building Savos going forward: testability cannot be treated as something that happens after a product is built. It needs to be considered during the design process itself.
Much like security, observability, and scalability eventually became core architectural requirements for enterprise software, we believe testability and assurance readiness will become core architectural requirements for enterprise AI systems.
The Safety vs UX Tradeoff Is Real
One theme that surfaced repeatedly throughout the project was the relationship between safety and user experience.
In theory, making a system safer can be straightforward. The easiest approach is often to increase restrictions and refusals.
In practice, hiring conversations are more nuanced than that.
Candidates do not interact with AI systems as security researchers. They interact with them as job seekers trying to present their experiences, clarify information, and communicate naturally.
The challenge becomes finding the right balance.
A system that is overly permissive may expose unnecessary risk. A system that is overly restrictive may create a frustrating candidate experience.
Finding that balance requires product judgement, governance decisions, and a deep understanding of how people actually interact with AI systems.
What This Means For Our Customers
For our customers, especially those operating in enterprise and regulated hiring environments, this initiative matters for multiple reasons.
First, it reinforces that the systems we are deploying are being developed with governance, assurance, and operational reliability in mind from the ground up — not added retroactively because the market suddenly started caring about AI risk.
The case study evaluated Savos against frameworks and practices aligned with IMDA’s Starter Kit for Testing LLM Applications, the EU AI Act, ISO/IEC 42001, the NIST AI Risk Management Framework, NYC Local Law 144 bias audit requirements, the OWASP Top 10 for LLM, and TAFEP’s Guidelines.
These are the same standards, frameworks, and governance considerations that enterprises increasingly need to navigate as AI adoption scales.
The project also reinforced that getting an AI workflow running is relatively straightforward today. Building one that is enterprise-ready is not.
The assurance process required us to evaluate Savos for bias, adversarial resilience, governance controls, auditability, traceability, testability, and compliance alignment. It also required the ability to generate evidence, repeat evaluations consistently, and demonstrate how the system behaves under different scenarios.
As more organizations explore building AI internally, many are discovering that the real challenge is making it reliable, governable, auditable, and trustworthy enough for production use.
That is where enterprise readiness is built, and that is ultimately what this initiative helped validate for Savos.
Continuous Assurance Will Matter More Than One-Time Validation
One of the strongest conclusions we came away with is that AI assurance cannot be treated as a one-time certification-style exercise. These systems evolve continuously.
Models change.
Prompts change.
Threat vectors change.
User behavior changes.
Enterprise policies change.
Which means assurance itself has to become continuous.
And this will eventually become one of the defining characteristics separating experimental AI products from mature enterprise AI systems.
The future winners in enterprise AI probably will not be the companies shipping the most features the fastest. They will be the organizations capable of continuously governing, evaluating, and improving autonomous systems operating under real-world complexity.
That requires much deeper operational maturity than most of the industry currently talks about.
Looking Ahead
Participating in the AI Verify Foundation’s case studies alongside Asenion was an important milestone for our team, but more importantly, it was an important learning experience.
The process challenged us technically, operationally, and philosophically. It pushed us to think more deeply about how enterprise AI systems should be designed, tested, governed, and continuously improved as autonomy increases.
Most importantly, it reinforced our belief that responsible AI development is not separate from enterprise innovation.
It is a prerequisite for it.
As we continue building Savos and the next generation of agentic AI capabilities for enterprise hiring, we remain committed to building systems that are not only intelligent and scalable, but also trustworthy, governable, and operationally resilient.
Because in the long run, the organizations that succeed with AI will not simply be the ones deploying the most advanced systems. They will be the ones enterprises trust enough to deploy at scale.
Read the Full Case Study
Explore the full AI Verify Foundation case study to see how Savos was tested for AI assurance, governance, bias, adversarial resilience, and enterprise readiness.