From LLMs to Multi-Agent Systems: How Semiconductor Architecture Is Moving from Compute to Decision

■ Introduction

In recent years, the evolution of LLMs has significantly changed the semiconductor industry.

The question was simple:

How can we compute faster?

The answer was a GPU-centric world.
The goal was to accelerate matrix operations to the extreme and maximize FLOPS.

However, AI is now entering the next phase.

AI is no longer just a single model.
It is beginning to evolve into a system where multiple agents collaborate.

This shift changes the design philosophy of semiconductors themselves.

■ 1. Semiconductors in the LLM Era: Compute-Centric

The computational characteristics of LLMs are clear.

Large-scale matrix operations
Parallel execution of uniform processing
Batch processing
Minimal branching

The architecture optimized for these characteristics was the GPU architecture centered around NVIDIA.

The structure was simple:

Input → Model → Output

What mattered most was:

How to maximize FLOPS.

■ 2. The Rise of the Multi-Agent Era

Agentic AI, or multi-agent systems, has a completely different computational structure.

Multi-step processing
Tool integration
Interaction with the environment
State retention
Communication between agents

The structure changes into:

Perception → Action → Feedback → Loop

This is no longer just inference.

It is a continuously running system.

■ 3. The Essential Change in Workloads

This shift changes the bottleneck.

■ LLM Era

Compute-dominant
GPU-optimized

■ Multi-Agent Era

Latency
Memory
Control flow
Communication

In other words:

The bottleneck moves from “compute” to “system.”

■ 4. Changes in Semiconductor Architecture

This shift greatly changes the structure of semiconductor architecture.

■ ① From GPU-Centric to Heterogeneous Computing

Before:

GPU-centric

Future:

CPU + GPU + Memory + Network

■ ② Re-evaluation of CPUs, Especially Arm

In multi-agent systems, the following become important:

Branching
State management
Asynchronous processing

This is why Arm is being re-evaluated.

Reasons include:

Low power consumption
Strength in control processing
Scalability

■ ③ Shift Toward Memory-Centric Design

In Agentic AI, the following are critical:

Context
History
State

Memory access becomes more dominant than computation.

Therefore, the importance of:

HBM
Near-Memory Computing

will increase.

■ ④ Network Becomes a Bottleneck

In multi-agent systems:

Communication between agents
External tool calls

occur frequently.

Network performance will increasingly determine overall system performance.

■ ⑤ Asynchronous and Event-Driven Processing

Before:

Synchronous processing
Batch processing

Future:

Event-driven processing
Asynchronous processing

Hardware will also need to be optimized for this model.

■ 5. What Happens to NVIDIA?

The important point is this:

GPUs will not become unnecessary.

NVIDIA will continue to play a central role in:

LLM inference
Training

However, its role will change.

Before:

The main actor responsible for everything

Future:

One component within a broader system

■ 6. Apple as a Sign of the Future

One company that already embodies this direction is Apple, through its SoC architecture.

It integrates:

CPU for control
GPU for computation
NPU for inference
Unified memory

This can be seen as a prototype of AI as a system.

■ 7. The Essential Shift

In one sentence, the shift can be described as follows:

■ Before: The LLM Era

AI = A compute problem

■ After: The Multi-Agent Era

AI = A system problem

■ 8. The Challenge of Agentic AI: It Runs, but It Is Difficult to Control

As we have seen, Agentic AI begins to operate as a system.

However, this creates a fundamental problem.

That problem is:

A system that keeps running is difficult to control.

■ Why Does Control Become Difficult?

Agentic AI has the following characteristics:

It has time, because it runs continuously.
It has state, such as context and history.
It interacts with the outside world through I/O.

As a result, the system enters the following state.

■ ① State Continues to Change

Context is continuously updated.
Past decisions influence the present.

Small deviations accumulate and change the behavior of the system.

■ ② Branching Becomes Invisible

Decisions are buried inside the model.
It becomes difficult to explain why a certain action was taken.

Decision-making becomes a black box.

■ ③ There Is No Clear Stopping Point

Loops
Retries
Exploration

These can lead to infinite execution and increased cost.

■ ④ The System Depends on the External Environment

APIs
Data
Context

Even the same process can produce different results.

■ ⑤ The System Cannot Be Reproduced

State is implicit.
History is incomplete.

Verification and improvement become difficult.

■ In One Sentence

Agentic AI becomes a system by acquiring time and state.

But at the same time, it introduces uncontrollability caused by history-dependence.

■ 9. Approaches to This Challenge

How to handle this uncontrollability is now a major turning point.

Several approaches are possible.

■ ① Optimization Through Reinforcement Learning

Learn behavior through trial and error
Maximize long-term reward

However:

Learning cost is high.
Constraints are difficult to make explicit.
Safety guarantees are weak.

■ ② Rule-Based Control

Make conditional branches explicit
Control behavior procedurally

However:

It does not scale well.
It lacks flexibility.

■ ③ Workflow / Orchestration

Control the system as a flow
Use DAGs or pipelines

However:

It is weak against dynamic situations.
Its handling of state is limited.

■ ④ Monitoring / Guardrails

Anomaly detection
Filtering

However:

It is reactive.
It is not fundamental control.

■ 10. DTM: Decision Trace Model as an Approach

There is another direction.

That direction is DTM.

■ What DTM Does

DTM is an approach that:

introduces a decision-making structure into a running system.

■ More Specifically

■ Decision Contract

Under what conditions
What should be selected

This makes branching explicit.

■ Boundary

What is acceptable
Where the system should stop

This prevents runaway behavior.

■ Human Gate

Return uncertain areas to humans

This prevents full automation from becoming uncontrolled automation.

■ Trace

When
What
Why

This makes reproduction, verification, and learning possible.

■ 11. Semiconductor Architecture in the DTM Era

A computational foundation for executing, controlling, and recording decisions

As we have seen, DTM has the following structure:

Define decisions through Decision Contracts
Apply constraints through Boundaries
Involve humans through Human Gates
Record history through Trace

This raises an important question:

What kind of semiconductor architecture is needed to support this structure?

■ 11.1 The Limits of Conventional Architecture

A conventional AI semiconductor architecture looks like this:

CPU

GPU

Memory

Storage

Network

This is suitable for:

Accelerating inference
Processing data

However, from a DTM perspective, it is insufficient.

The reason is clear:

There is no processing unit for “decision-making.”

■ 11.2 Structural Decomposition Through DTM

When DTM is mapped onto hardware, the process can be decomposed as follows:

Signal → Decision → Boundary → Execution → Trace

This can then be mapped directly onto semiconductor architecture.

■ 11.3 A New Structure: Decision-Centric Architecture

[DTM-aware System Architecture]

1. Signal Engine
2. Decision Engine
3. Boundary Engine
4. Execution Engine
5. Trace Engine

Let us look at each component.

■ ① Signal Engine

Role:

LLM / ML inference
Interpretation of the situation

Implementation:

GPU / NPU
Existing AI accelerators

This is an extension of the LLM era.

■ ② Decision Engine

Role:

Evaluation of Decision Contracts
Conditional branching
Priority control

Required characteristics:

Fast branching
Low-latency rule evaluation
Ability to reference state

Possible implementations:

Enhanced CPUs, especially Arm-based architectures
Dedicated DSL execution units
Hardware-accelerated rule engines

This becomes a new core area.

■ ③ Boundary Engine

Role:

Checking safety constraints
Determining whether execution is allowed
Stopping or escalating in abnormal situations

Required characteristics:

Real-time judgment
Fail-safe behavior
Priority evaluation

Possible implementations:

Hardware-level constraint checking
Safety controllers similar to automotive SoCs

This is the layer that stops what must not be done.

■ ④ Execution Engine

Role:

Connection with external systems
Issuing control signals
Executing tasks

Required characteristics:

High-speed I/O
Asynchronous processing
Event-driven execution

Possible implementations:

CPU + DPU
SmartNIC

This is the layer that moves the system.

■ ⑤ Trace Engine

Role:

Recording decisions
Saving state
Generating data for reproduction and learning

Required characteristics:

Low-latency writing
Time-series consistency
High-frequency logging

Possible implementations:

High-speed log buffers using SRAM
Streaming writes
Ledger-specific storage

This is the newest element introduced by DTM.

■ 11.4 Why This Architecture Is Needed

In conventional architecture:

Inference is possible.
Execution is possible.

However:

It is unclear where a decision was made.
Constraints are added afterward.
History remains fragmented.

In a DTM-based architecture:

Decisions are made explicit.
Constraints are applied before execution.
History is recorded consistently.

This shifts the system:

from a system that merely runs
to a system that can be controlled.

■ 11.5 A New Semiconductor Concept

This structure suggests a new category beyond conventional CPUs and GPUs.

■ Decision Processing Unit

Its role would be:

Executing decisions
Applying constraints
Recording history

Before:

CPU: control
GPU: computation

Future:

Decision Unit: decision-making

The role of semiconductors expands.

■ 11.6 Overall Architecture

[DTM × Multi-Agent Semiconductor Stack]

Signal: GPU / NPU
↓
Decision: CPU / Decision Unit
↓
Boundary: Safety Controller
↓
Execution: CPU / DPU
↓
Trace: Ledger Memory / Storage

■ 11.7 Conclusion

In the LLM era, semiconductors accelerated computation.

In the Agentic AI era, semiconductors began to move systems.

And in the DTM era, semiconductors will evolve into a foundation that:

executes, controls, and records decisions.

Deux Ex Machina

AIシステム設計・意思決定構造の設計を専門としています。
Ontology・DSL・Behavior Treeによる判断の外部化、マルチエージェント構築に取り組んでいます。

Specialized in AI system design and decision-making architecture.
Focused on externalizing decision logic using Ontology, DSL, and Behavior Trees, and building multi-agent systems.