Introduction
In recent years, AI companies such as Anthropic have significantly advanced agent technologies.
AI is no longer:
- something that answers
but has become:
👉 something that acts (Agent)
- Calling tools
- Accessing external data
- Executing code
- Autonomously progressing tasks
This evolution has brought AI into the realm of practical deployment.
However, at the same time, it introduces a new class of problems that were previously less visible:
👉 How do we safely control agents?
In traditional AI systems, evaluation focused on:
- Output accuracy
- Naturalness of responses
But in the age of agents, the problem shifts:
👉 Not what to output, but what to execute
The moment AI begins to act,
it becomes:
👉 a safety-critical system
Problem: Why Agents Do Not Stop
As agent adoption increases, the following issues are emerging in real-world environments:
- Executing unnecessary operations
- Proceeding without required confirmation
- Producing inconsistent decisions for the same input
- Being unable to explain why a specific action was taken
At first glance, these may appear to be:
- Accuracy issues
- Model limitations
- Implementation bugs
However, this is not the case.
👉 This is a safety design problem
More precisely:
👉 The structure for stopping and decision-making is not defined
In traditional AI:
- Output quality
- Response correctness
were the main concerns.
But agents are different.
👉 The moment AI starts acting, it must be designed as a controlled system
Core Insight: Signal ≠ Decision
Agents like those from Anthropic operate on:
- Search results
- Inference outputs
- Generated text
- Action candidates
These are:
👉 Signals (inputs to judgment)
However, what real-world systems require is:
👉 Decision (final commitment to action)
Why This Becomes a Safety Problem
The core issue is this:
Agents are very good at generating Signals.
But:
👉 They do not take responsibility for selecting Decisions
As a result:
- What gets executed
- Where to stop
- When to escalate to humans
all become ambiguous.
👉 There is no structure for safe stopping
This is why:
👉 Agents do not stop
Internal Structure of Agents
A typical agent operates in a loop:
This loop is powerful:
- Observe environment
- Infer next action
- Execute
- Update state
However, from a safety perspective, it has a critical omission:
There is no explicit:
- Decision selection
- Stopping condition (Boundary)
- Human delegation
👉 Actions are generated, but not validated
This leads to:
- Continuous action generation
- Implicit stopping (not guaranteed)
- Uncertain human escalation
👉 Therefore, agents do not stop
Wrong Approach
Many systems attempt to address this with prompts such as:
- “Stop if dangerous”
- “Confirm before execution”
- “Ask a human if uncertain”
This appears safe—but it is not.
Because agents are not:
👉 systems that enforce strict conditions
They are:
👉 systems that generate the most plausible next action
Instructions are interpreted as:
- vague guidelines
- context-dependent suggestions
As a result:
👉 behavior varies by situation
👉 safety is not structurally guaranteed
Concrete Failures
Example 1: Auto-closing customer inquiries
Even with:
“Ask a human if uncertain”
Agents may:
- generate a plausible FAQ response
- mark the issue as resolved
- skip escalation
Example 2: Over-modifying code
Even with:
“Avoid risky changes”
Agents may:
- modify multiple files
- update configurations
- rewrite tests
Example 3: Executing unintended operations
Agents may:
- delete records
- send emails
- update systems
because:
👉 “importance” is not structurally defined
Core Issue
All these share one root cause:
👉 Stopping conditions are not structurally defined
Prompts like:
- “Be careful”
- “Confirm”
are:
👉 policies, not rules
Agents interpret them as:
👉 ambiguous context
Fundamental Limitation
This is not an implementation issue.
👉 It is a fundamental limitation
LLMs are:
👉 systems that generate the most probable next output
They do not:
- enforce rules
- evaluate conditions strictly
They only produce:
👉 continuous sequences of tokens
as described in “Judgment Cannot Be Expressed by Smooth Computation — The Problem of Discontinuity in AI“
Continuous vs Discrete
This creates a fundamental gap:
- LLM → continuous generation
- Decision → discrete commitment
👉 Continuous processes cannot produce discrete decisions
Why Decision Cannot Be Embedded
A Decision requires:
- evaluation
- selection
- responsibility
But LLM outputs are:
👉 generated candidates
not:
👉 committed decisions
Therefore:
- No entity determines “must stop”
- No guarantee exists
👉 Prompt-based safety cannot work
Solution: Boundary, Not Prompt
Instead of saying:
“Please stop if dangerous”
We must define:
- Always stop under condition X
- Always require human approval under condition Y
- Always redirect under condition Z
👉 Stopping conditions must exist as structure
Now:
👉 Agents become controllable
Reframing the Problem
The problem is simple:
👉 Agents do not stop
But this is not behavioral.
👉 It is structural.
The Right Question
Not:
👉 How do we stop agents?
But:
👉 Where do we define decisions?
Solution: Decision Trace Model (DTM)
DTM defines decisions as structure:
This clarifies:
- What happened
- How it was interpreted
- What was selected
- What constraints applied
- Whether humans intervened
- How it was recorded
Key Insight
👉 Decision must pass through Boundary
Boundary determines:
- Execute
- Stop
- Escalate
Redefining “Stopping”
👉 Stopping = not passing the Decision
Agents continue generating Signals.
But:
- Reject (Hard Stop)
- Hold (Soft Stop)
- Redirect
👉 No execution occurs
Three Types of Stops
1. Hard Stop
Reject the Decision
2. Soft Stop
Delegate to human
3. Redirect
Route elsewhere
Why This Matters
Traditional design:
- controls behavior
- relies on prompts
DTM:
👉 controls Decisions
Roles
- Agent → generates Signals
- DTM → selects Decisions
👉 Agent does not stop
👉 Decision is stopped
Best Practice Architecture
Layer 1: Capability Control
Restrict what is possible
Layer 2: Decision / Boundary
Control what is allowed
Layer 3: Human-in-the-loop
Restore responsibility
Layer 4: Logging
Enable replay and improvement
Architecture
↓
Agent (Signal)
↓
Decision (DTM)
↓
Boundary
├ STOP
├ HUMAN
├ REDIRECT
└ EXECUTE
↓
Execution
↓
Log
Why DTM Works
DTM:
- externalizes decisions
- defines stopping conditions
- structures human intervention
- records everything
👉 This is a paradigm shift
Before / After
Before
- Autonomous
- Unstoppable
- Non-reproducible
- No accountability
After
- Controlled per decision
- Stoppable at boundary
- Human escalation defined
- Fully traceable
Conclusion
Anthropic’s agent technology has transformed AI into:
👉 something that acts
But that is not enough.
👉 We must define where Decisions are stopped
Final Statement
👉 Agents generate Signals.
👉 Decisions change the world.
Summary
The best-in-class design for controlling signal-based agents is:
👉 a multi-layered architecture centered on DTM
And the essence is simple:
👉 Agents do not stop.
👉 Decisions stop them.
Ultimately, as agents become autonomous actors,
we must start designing them with the same rigor as safety-critical systems in industries like automotive manufacturing.
Five-nines reliability is no longer optional.
It becomes a requirement.

AIシステム設計・意思決定構造の設計を専門としています。
Ontology・DSL・Behavior Treeによる判断の外部化、マルチエージェント構築に取り組んでいます。
Specialized in AI system design and decision-making architecture.
Focused on externalizing decision logic using Ontology, DSL, and Behavior Trees, and building multi-agent systems.
