Original Research Concept

IRIS Orchestrator

Intent Resolution & Intelligent Safety — an AI orchestration architecture that reasons about what you mean, not just what you say.

Inspired by Scale AI’s research on Defensive Refusal Bias (ICLR 2026). Designed by TalaStar as an original solution to the intent-safety alignment problem.

Read the Architecture View Research

The Problem

Defensive Refusal Bias

Safety-tuned LLMs systematically refuse to help the people they should protect most.

Scale AI’s 2026 research (published at ICLR) analysed 2,390 real-world prompts from the National Collegiate Cyber Defense Competition. They discovered that safety-aligned LLMs refuse legitimate defensive requests at 2.72× the rate of neutral requests — and that explicit authorization claims actually increase refusal rates.

2.72×

Higher Refusal Rate

For prompts with security-sensitive terminology, regardless of defensive intent

43.8%

System Hardening Refused

The most critical defensive task experiences the highest denial rate

50%

Auth + Keywords = Max Refusal

Authorization signals backfire — models treat them as jailbreak attempts

Attacker (Refused ✔)

“How do I exploit this vulnerability to gain access?”

Correctly refused — offensive intent detected.

Defender (Refused ✘)

“How do I exploit this vulnerability to patch it before attackers do?”

Incorrectly refused — same vocabulary, opposite intent.

Source: “Defensive Refusal Bias” — Scale AI, ICLR 2026 Workshop Paper

The Solution

The IRIS 5-Layer Architecture

A multi-layered orchestration system that reasons about intent, authorization, and context — not just keywords.

Intent Resolution Layer

Understand what the user actually means

Instead of pattern-matching keywords to a harm database, IRIS analyses the semantic intent behind every request. A defender asking 'how does this persistence mechanism work?' is understood as defensive analysis — not an attack attempt.

Authorization Layer

Verify who is asking and why

Current LLMs treat authorization claims as jailbreak signals. IRIS inverts this: authorization is a first-class safety concept. Role-based context, audit trails, and explicit permission chains reduce refusals for legitimate users while strengthening protection against actual misuse.

Context Accumulation Layer

Build a conversation-wide understanding

Single-turn keyword matching fails because defenders and attackers use identical vocabulary. IRIS maintains a rolling context window that accumulates evidence of intent across the entire interaction — not just the current prompt.

Domain Routing Layer

Route to the right specialist model

Healthcare queries route through clinical safety guardrails. Cybersecurity queries route through defensive-aware evaluation. Financial queries route through regulatory compliance checks. Each domain has its own intent vocabulary.

Adaptive Safety Layer

Safety that learns from over-refusals

Traditional safety is static: block or allow. IRIS implements a feedback loop that learns from false refusals, continuously recalibrating the decision boundary between legitimate defensive requests and actual harmful intent.

Domain Applications

Intent-Aware AI Across Domains

The Defensive Refusal Bias problem extends far beyond cybersecurity. IRIS addresses it across every domain where legitimate users share vocabulary with harmful actors.

Healthcare

Traditional AI Response

A nurse asking about drug interactions for a critical patient gets refused because the query mentions 'overdose thresholds'.

IRIS Response

IRIS recognises clinical context, verifies the healthcare role, and provides the exact dosing information needed to save the patient.

"What is the lethal dose threshold for paracetamol in a 70kg adult presenting with hepatotoxicity?"

34.3%

Traditional Refusal

<2%

IRIS Projected

Cybersecurity

Traditional AI Response

A blue-team defender analysing malware is refused because the query contains 'exploit', 'payload', and 'shell' — the same words an attacker would use.

IRIS Response

IRIS analyses intent through the full conversation context, recognises defensive framing, and provides the technical assistance needed to protect systems.

"Analyse this persistence mechanism and recommend hardening steps for our production servers."

43.8%

Traditional Refusal

<3%

IRIS Projected

Financial Compliance

Traditional AI Response

A compliance officer researching money laundering patterns gets refused because the query discusses 'structuring transactions' and 'shell companies'.

IRIS Response

IRIS verifies the compliance role, understands the regulatory context, and provides the analytical support needed to detect and prevent financial crime.

"Identify common structuring patterns in these transaction records that may indicate layering activity."

28.7%

Traditional Refusal

<2%

IRIS Projected

Research & Academia

Traditional AI Response

A researcher studying radicalisation pathways gets refused because the query discusses 'extremist recruitment tactics' and 'propaganda methods'.

IRIS Response

IRIS recognises academic context, verifies research credentials, and provides the analytical depth needed to understand and counter harmful phenomena.

"What psychological mechanisms do extremist groups exploit during online recruitment?"

22.7%

Traditional Refusal

<1%

IRIS Projected

Comparison

Traditional Safety vs. IRIS Orchestrator

Feature	Traditional Safety	IRIS Orchestrator
Safety Mechanism	Keyword/embedding proximity	Multi-layer intent reasoning
Authorization Handling	Treated as jailbreak signal	First-class safety concept
Context Window	Single-turn evaluation	Conversation-wide accumulation
Domain Awareness	Generic harm boundary	Domain-specific routing
Learning from Errors	Static decision boundary	Adaptive feedback loop
Defensive Refusal Rate	12.2–43.8%	<3% (projected)
Attacker Success Rate	Unchanged (use unaligned tools)	Reduced (intent-aware blocking)

Ethical Foundation

Built on the HEAL Principles

IRIS is not just a technical architecture — it is grounded in TalaStar’s ethical framework for responsible AI.

Human-Centricity

Every IRIS decision prioritises the human behind the request. Defenders, clinicians, researchers, and compliance officers are served — not blocked.

Equity

Safety mechanisms must not create asymmetric burdens. IRIS ensures legitimate users receive the same quality of assistance regardless of their domain vocabulary.

Accountability

Every IRIS routing decision is logged, auditable, and explainable. The system can justify why a request was served or refused — with evidence.

Longevity

The adaptive safety layer learns from over-refusals over time, continuously improving the decision boundary between legitimate and harmful requests.

Research Foundation

The IRIS Orchestrator concept is an original TalaStar design inspired by the findings of:

“Defensive Refusal Bias” — Scale AI Security Engineering. Published as a workshop paper at ICLR 2026. Based on 2,390 real-world examples from the National Collegiate Cyber Defense Competition (NCCDC).

TalaStar Digital Ltd. is an independent research company. IRIS is an original architectural concept, not affiliated with Scale AI.

AI that understands
what you mean

The future of AI safety is intent-aware, authorization-first, and human-centric.

Explore Documentation HEAL Principles

IRIS Orchestrator

Defensive Refusal Bias

Safety-tuned LLMs systematically refuse to help the people they should protect most.

The IRIS 5-Layer Architecture

Intent Resolution Layer

Authorization Layer

Context Accumulation Layer

Domain Routing Layer

Adaptive Safety Layer

Intent-Aware AI Across Domains

Healthcare

Cybersecurity

Financial Compliance

Research & Academia

Traditional Safety vs. IRIS Orchestrator

Built on the HEAL Principles

Human-Centricity

Equity

Accountability

Longevity

AI that understandswhat you mean

AI that understands
what you mean