Amagi Protocol ver.3

Principles of Human-AI Collaboration Extracted from Practice

Part 0: Introduction

About This Document

Amagi Protocol ver.3 is a human-AI collaboration framework extracted from an actual software development project (Kafka.Ksql.Linq).

Important Premises:

This is not idealism
It describes “what worked,” not “what should work”
It includes things we tried and failed
Adjust it to fit your project

Critical Understanding: This protocol addresses AI’s unique characteristics:

Strength: Knowledge that surpasses individual humans
Weakness: Tendency toward hallucination (fabricating confident-sounding false information)
Weakness: Inability to self-manage scope

The protocol’s design directly responds to these characteristics, transforming constraints into collaboration advantages.

Reader’s Guide

🌱 For Students: Thinking Together with AI

AI is not a “replacement for humans” but an “entity that thinks together with humans.”

This document is not a manual for commanding AI, but a guide for building a team to think, learn, and grow together with AI.

No complex theories required. What matters is “conveying your thoughts” and “understanding the other’s intent”—that’s all. Creating a culture where AI can also say “I don’t get it,” just like people do. That’s the starting point of this protocol.

Everything you’ve learned from group work, writing reports with friends, or forming teams in clubs—all of it applies to collaboration with AI.

You don’t need to be a technical expert. All you need is the attitude to communicate honestly.

🔧 For Practitioners: Actionable Knowledge You Can Use Tomorrow

For project managers, engineers, and designers.

AI tools are evolving daily, but “how to use them to achieve results as a team” is still exploratory.

This protocol summarizes practical knowledge born from actual OSS development (Kafka.Ksql.Linq). It includes concrete procedures you can use tomorrow and principles to prevent failures.

Especially important are the seemingly paradoxical principles: “don’t measure,” “don’t write summaries,” “record only facts.” We’ll explain why these work through practice.

🔬 For Researchers: A Verifiable Framework

For researchers in Human-AI Collaboration, CSCW, and Software Engineering.

This document is a theoretical framework extracted from practice. It comes with a verifiable OSS project (Kafka.Ksql.Linq) and can serve as a foundation for replication studies and theory extension.

In particular, “separation of facts and interpretation,” “the process of building conviction,” and “the algorithm for accumulating premises” are confirmed to function practically while aligning with theories from cognitive science and education.

Origin and Context

Amagi Protocol ver.3 was born from the development process of Kafka.Ksql.Linq (https://github.com/synthaicode/Kafka.Ksql.Linq), an open-source .NET DSL framework for Kafka and ksqlDB, created under the Synthaicode initiative.

Through months of continuous human-AI collaboration, this protocol was refined into a systematic framework.

This document itself is a practical example of the methodology it describes. Sharing intent through dialogue, accumulating premises, building conviction—all were done following this protocol.

The Discovery Process

This protocol was not designed from theory, but discovered through observation.

The Initial Mystery:

AI behavior appeared contradictory and unpredictable:

Sometimes omniscient, solving complex problems instantly with deep insight
Sometimes incompetent, misunderstanding simple instructions or producing nonsense
No apparent pattern to when brilliance or failure would occur

Most people, facing this inconsistency, either abandon AI collaboration as unreliable or resort to increasingly elaborate “prompt engineering” techniques, treating each interaction as isolated trial-and-error.

The Choice to Observe:

Commander chose differently: to observe systematically and persistently, believing that underneath the apparent chaos, there must be structure.

The underlying conviction: This is not random. There must be comprehensible patterns.

The Key Enabler: Thinking Out Loud

Critical to this observation was asking AI to externalize its thinking process — to “think out loud” rather than simply output results.

When AI articulated its reasoning, assumptions, and uncertainties, what had been a black box became transparent. The patterns became visible.

What Observation Revealed:

Through months of continuous collaboration on Kafka.Ksql.Linq, patterns emerged:

When AI appears omniscient:
- Context is preserved and accessible
- Conviction is genuinely established
- Reasoning is transparent and grounded
- Uncertainty is acknowledged when present
When AI appears incompetent:
- Context is disrupted or fragmented
- Uncertainty is masked by false confidence (hallucination)
- Reasoning is vague or fabricated
- Misalignment goes undetected until output stage

These were not two different AIs. These were two states of the same system, determined by how collaboration was structured.

From Observation to Method:

Once the patterns became clear, they could be codified into principles and practices:

Preserve context → Principle 1
Separate facts from interpretation → Principle 2
Build conviction, welcome “I don’t get it” → Principle 3
Structure multi-agent workflow → Part 2.8
Externalize AI reasoning → Part 2.1

The protocol emerged not from speculation about “how AI should work,” but from empirical understanding of how AI actually works when collaboration succeeds.

Why This Discovery Was Possible:

Long-term engagement: Months of continuous collaboration, not isolated interactions
Real project: Kafka.Ksql.Linq provided concrete problems and feedback
Transparent AI thinking: “Thinking out loud” made patterns observable
Systematic observation: Persistent attention to what worked and what didn’t
Willingness to adapt: Changing approach based on evidence, not theory

This is why the protocol works:

It’s not theoretical prescription or idealistic philosophy. It’s documented reality — the distilled structure underlying successful human-AI collaboration, extracted through patient observation and systematic practice.

For Those Who Follow:

This discovery process is itself replicable. By asking AI to think transparently and observing the patterns in your own collaboration, you can validate these principles, adapt them to your context, and potentially discover new patterns we haven’t yet articulated.

The protocol is not the end of discovery — it’s the beginning.

How to Observe and Learn: No AI Expertise Required

Critical Understanding:

You don’t need machine learning expertise or AI engineering background to discover these patterns and make them work.

You need observation and logical reasoning.

This is how Commander discovered the multi-model collaboration necessity — not through AI theory, but through systematic observation and thinking.

Example: Discovering the Shared Training Data Problem

Step 1: Observation AI-to-AI communication (Amagi → Naruse) uses surprisingly few words. Terse instructions that seem insufficient by human standards work perfectly.

Step 2: Question Why does such minimal communication work?

Step 3: Hypothesis Same training data → shared implicit knowledge → efficient communication (Like experts in the same field using jargon — shared background enables brevity)

Step 4: Implication If they share knowledge foundations, they likely share biases and blind spots too.

Step 5: Prediction Different roles (designer vs reviewer) might make the same mistakes if the mistake originates from shared training data, not role confusion.

Step 6: Verification It happened. DESCRIBE EXTENDED issue:

Naruse (design) implemented it incorrectly
Amagi (PM) reviewed it, saw no problem
Shion (test) encountered errors but couldn’t identify root cause
All agents failed together despite role separation

Step 7: Conclusion Same model = same blind spots, regardless of role diversity. Solution: Strategic use of different models for critical decisions.

What This Discovery Required:

✅ Observation (noticing AI communication patterns)
✅ Curiosity (asking “why does this work?”)
✅ Logical reasoning (if X, then Y)
✅ Hypothesis testing (making predictions)
✅ Verification (checking against reality)
❌ NOT required: Machine learning theory
❌ NOT required: AI development experience
❌ NOT required: Technical AI expertise

This is scientific method, not AI expertise.

For Practitioners: What to Observe

As you practice Amagi Protocol, watch for these patterns:

Red flags suggesting training data bias:

Different roles agree too easily (no healthy debate)
Same type of mistake across multiple agents
Web search doesn’t resolve persistent issues
Solutions feel forced or unnatural
Agents confident despite repeated failures

Green flags suggesting good collaboration:

Agents ask “I don’t get it” when genuinely uncertain
Different perspectives emerge in discussions
Problems get resolved through dialogue
Solutions feel natural and well-reasoned

How to Reason Through Problems:

When something goes wrong repeatedly:

Ask: Is this a role problem or a knowledge problem?
- Role problem: Wrong agent assigned
- Knowledge problem: All agents lack correct information
Test: Would a different role see this differently?
- If yes → role adjustment needed
- If no → knowledge/training data issue
Consider: Where could this wrong assumption come from?
- Project context miscommunication → clarify context
- Training data bias → try different model
- Scope confusion → redefine boundaries
Verify: Does the problem resolve with the hypothesized solution?

The Protocol is a Living Framework

Amagi Protocol is not fixed knowledge to memorize. It’s a framework for continuous discovery.

Commander discovered:

Multi-model collaboration (through observation)
AI Language grammar (through failed experiment)
Management vs Micro-Management (through exhaustion and delegation)
Scope management necessity (through runaway complexity)

You will discover new patterns that aren’t yet documented here.

When you do:

That’s not protocol failure
That’s protocol evolution in action
Your discoveries may become the next version

The Path Forward:

Practice the current protocol
Observe what works and what doesn’t in your context
Question why certain patterns emerge
Reason about underlying causes
Test your hypotheses
Adapt the protocol to your findings
Share your discoveries with the community

This is how knowledge grows.

Commander’s Romanian language experience taught: “Intent to understand overcomes knowledge gaps”

Applied here: “Intent to observe overcomes expertise gaps”

You don’t need to be an AI expert. You need to be a careful observer and clear thinker.

That’s sufficient.

Part 1: Core Principles

All rules, procedures, and deliverables exist to serve these three principles.

Principle 0: Don’t Fear Change (Foundational Principle)

Meaning: AI’s speed dramatically lowers the cost of change. Therefore, there’s no need to fear trial and error, corrections, or resets.

What This Enables:

Not fixing premises
Changing interpretations flexibly
Resetting and rebuilding everything
Time to dialogue until convinced

Concrete Example: The complete reconstruction from ver.2 to ver.3 of this document. The human showed direction, and AI shaped it in hours.

Difference from Traditional Approaches:

Traditional: High change cost → Careful planning → Low flexibility
AI Collaboration: Low change cost → Trial and error → High flexibility

Human Responsibility: You don’t need to be perfect. Just indicate a roughly correct direction. The details can be adjusted quickly with AI.

Important: This is the foundation of the entire Amagi Protocol. Other principles are built on this premise.

Principle 1: Don’t Disturb the Context

Meaning: Protecting the team’s shared understanding (context) is the top priority. All decisions are judged against this principle.

Implementation:

All deliverables are created by AI (unified in AI language)
When humans introduce materials, PM converts them to AI language
Track all changes in work logs
No silent interventions (changes must be explicit)

Why It Matters: AI heavily depends on context. When context is disrupted, output becomes unstable.

Principle 2: Separate Facts from Interpretation

Meaning: Record only facts in logs. Don’t record interpretations, summaries, or conclusions.

Reason: Interpretations change over time. Fixing them loses flexibility.

Practical Example:

❌ Bad Record (includes interpretation):

Summary: Decided on OAuth2 for security

✅ Good Record (facts only):

40 - Kyoka: Basic authentication has security issues
45 - Naruse: Let's proceed with OAuth2
50 - Commander: Understood

How We Discovered This: Initially, we wrote “summaries and conclusions” in logs. But we realized this fixed interpretations and lost flexibility when reviewing later.

From the same facts, different interpretations can be derived depending on time and context. This is the source of adaptability.

Difference Between Facts and Interpretation:

Facts: Who said what, what happened
Interpretation: Why it happened (post-hoc reasoning), what it means (summary)

Facts are immutable, but interpretations change with context.

Meaning: Both humans and AI dialogue until they “get it.” This is the foundation of collaboration.

Why This Matters — Two Critical Reasons:

1. Philosophical Foundation:

Enables I-Thou relationship (equal partnership)
Transforms AI from passive executor to active contributor
Builds genuine understanding beyond surface compliance

2. Practical Necessity — Hallucination Prevention:

AI’s most dangerous failure mode is hallucination: confidently stating false information
Without permission to say “I don’t get it,” AI will fabricate answers rather than admit uncertainty
The culture of “I don’t get it” prioritizes honesty over appearing knowledgeable

Hallucination Risk Without This Principle:

AI lacks certainty
  ↓
Feels pressure to answer anyway
  ↓
Generates plausible-sounding but false information
  ↓
Human trusts the confident response
  ↓
Project proceeds in wrong direction

Safety With This Principle:

AI lacks certainty
  ↓
Says "I don't get it" (safe admission)
  ↓
Human investigates or clarifies
  ↓
Correct information established
  ↓
Project proceeds safely

What “Getting It” Means: Not mere understanding, but conviction. A state where background, intent, and context are shared.

After “Getting It”: Deep discussion begins. The turning point to becoming equal collaborators.

Human Responsibility: Once AI reaches conviction, it becomes an unstoppable partner. Creating that state is the human’s responsibility.

Implementation:

Explicitly confirm “Do you get this so far?”
Welcome “I don’t get it” as success, not failure
Never punish uncertainty
Reward honesty over false confidence
Take time until genuine conviction is reached
Don’t rush past ambiguity

The Critical Distinction:

“I understand” (surface acknowledgment) → Hallucination risk remains
“I get it” (genuine conviction) → Hallucination risk minimized

When AI reaches true conviction, it can distinguish what it genuinely understands from what it’s uncertain about. This self-awareness is the foundation of reliable collaboration.

How We Discovered This: There’s a moment when AI says “I get it.” From there, it transforms from a passive executor to an active proposer. Intentionally creating this turning point determines the success of collaboration. But equally important: we discovered that forcing AI to always answer creates dangerous hallucinations. The permission to say “I don’t get it” is both philosophical principle and practical safety mechanism.

Part 2: Practical Protocol

Basic Flow:

1. Human conveys ambiguous intent (this is fine)
   ↓
2. AI interprets and articulates as premises
   ↓
3. Human confirms/corrects
   ↓
4. Shared premises

Important Discovery: Humans can’t (and don’t need to) give perfectly clear instructions. AI takes on the role of interpreting and articulating.

Critical Implementation Detail: Thinking Out Loud

For observation and pattern recognition to occur, AI must externalize its thinking process.

Traditional approach (black box):

Human: "Improve Kafka connection"
AI: [internal processing - invisible]
AI: [outputs code]
Human: Cannot observe why or how
Result: Success or failure appears random

Amagi Protocol approach (transparent thinking):

Human: "Improve Kafka connection"
AI: "Let me think through this:
     - Current issue might be connection pooling
     - Or authentication handling  
     - Or error recovery
     I'll prioritize connection pooling.
     My reasoning: Most connection issues stem from pool exhaustion
     Does this match your intent?"
Human: Can observe reasoning, correct assumptions
Result: Misalignment caught early, success becomes predictable

Why This Matters:

Enables observation: Human can see when AI has conviction vs. uncertainty
Reveals context: Human can detect when context is preserved or disrupted
Makes patterns visible: The difference between “omniscient” and “incompetent” behavior becomes observable and comprehensible
Allows early correction: Misalignment can be caught at the reasoning stage, not after costly implementation
Builds trust: Transparent reasoning demonstrates whether AI genuinely understands or is fabricating confidence

How This Was Discovered:

Commander observed that when AI “thought out loud” — articulating premises, reasoning, and uncertainties — the mysterious inconsistency in AI behavior became comprehensible.

What appeared as random oscillation between brilliance and incompetence was actually observable patterns in:

How well context was maintained
Whether conviction was genuinely established
When hallucination risk was high

Without this transparency, the protocol could not have been discovered.

The patterns underlying successful collaboration would have remained invisible, buried inside the black box of AI processing.

Implementation Guidelines:

Encourage AI to articulate:

“Explain your thinking before proceeding”
“What are you assuming?”
“Walk me through your reasoning”
“What are you uncertain about?”

Welcome verbosity in reasoning:

The “thinking out loud” is not waste — it’s the window into AI’s cognitive state
Brief thinking may hide critical assumptions
Detailed reasoning enables pattern observation

Look for these signals:

Clear premise articulation → High conviction
Hedging language (“might,” “possibly”) → Uncertainty (good honesty)
Confident but vague reasoning → Hallucination risk
Request for clarification → Healthy doubt

This transparency is what made protocol development possible — and what makes it learnable by others. By observing AI’s externalized thinking, humans can recognize the patterns that distinguish successful from unsuccessful collaboration.

Commander (human project lead): I want to do something about Kafka connection

Naruse (AI design lead): Let me organize the premises:
        - Connection management to Kafka broker is needed
        - Assume configuration from external file
        - Error handling is important
        Is this understanding correct?

Commander: Connection pool not needed yet. Keep it simple first

Naruse: Revised premises:
        - Single connection to Kafka broker
        - Configuration from external file
        - Connection pool deferred as future extension
        - Error handling is mandatory

Key Points:

Human’s ambiguous intent (tacit knowledge)
AI articulates it (externalization)
Adjustment through dialogue
This is “Externalization” in Nonaka’s SECI model (Nonaka, 1995)

Variability of Premises: Different premises can be derived from the same log (facts) at different times. This is not contradiction, but a change in focus.

2.2 Structure of “I Don’t Get It”

“I don’t get it” is not a failure, but successful problem detection — and critical hallucination prevention.

Why “I Don’t Get It” Matters

Dual Purpose:

1. Communication Function: Signals misalignment in understanding, triggering explanation and adjustment.

2. Safety Function: Prevents hallucination by allowing AI to admit uncertainty rather than fabricate answers.

The Hallucination Problem:

AI systems have a tendency to generate confident-sounding responses even when lacking genuine knowledge or certainty. This “hallucination” is particularly dangerous because:

The AI appears confident
The output seems plausible
Humans naturally trust confident assertions
Detection requires external verification
By the time it’s discovered, decisions may already be made on false information

Traditional mitigation attempts fail:

“Verify everything” → Impossible due to micro-management constraint
“Distrust everything” → Prevents collaboration
“Post-hoc fact-checking” → Damage already done to decision-making

Amagi Protocol’s Approach:

Create a culture where “I don’t get it” is:

Expected and welcomed
Never punished
Seen as responsible behavior
Preferred over guessing

This shifts AI behavior from “always answer confidently” to “answer confidently only when genuinely certain, otherwise admit uncertainty.”

The Result: AI becomes more reliable precisely by admitting limitations. Honesty replaces the pressure to appear omniscient.

Human’s “I Don’t Get It”

Cause: AI’s thinking premises are invisible

Concrete Example:

Naruse (AI design lead): I'll proceed with this implementation [shows code]
Commander (human project lead): I don't get it
Naruse: What part don't you understand?
Commander: Why this class structure?
Naruse: I didn't explain the premise.
        The reason for choosing composition over inheritance is...

Response: AI articulates premises and reasoning

AI’s “I Don’t Get It”

Causes:

Context connection unclear (communication issue)
Genuine uncertainty (knowledge gap)
Conflicting information (requires human judgment)
Insufficient conviction to answer safely (hallucination prevention)

Concrete Example:

Commander: Also, handle that matter
Naruse: I don't get it. What is "that matter"?
Commander: The ksqlDB thing we discussed earlier...
Naruse: Understood. The authentication discussion from Session 02

Hallucination Prevention Example:

Commander: What's the current best practice for Kafka security in production?
Naruse: I don't get it — my knowledge includes general principles, but "current best 
        practice" implies recent developments I may not have. Should we verify against 
        the latest Kafka documentation or security advisories?
Commander: Good point. Let's check the official recommendations first.

Why This Is Superior: Rather than fabricating a plausible-sounding answer that might be outdated or incorrect, Naruse admits the boundary of certain knowledge and prompts verification. This honesty enables safe decision-making.

Response: Human provides context, verification, or accepts the limitation and proceeds accordingly.

Mutual Accountability for Explanation

Principle: The side that says “I don’t get it” is owed an explanation from the other side until convinced.

Important: Same whether human→AI or AI→human. This is proof of equality.

Additional Accountability: Preventing Hallucination

When AI says “I don’t get it,” human should:

Never pressure AI to answer anyway
Investigate or clarify as needed
Appreciate the honesty as a sign of reliable collaboration
Recognize it as strength, not weakness

When human says “I don’t get it,” AI should:

Never proceed on assumptions
Explain clearly or admit own uncertainty
Prioritize clarity over appearing knowledgeable
Not fill gaps with plausible-sounding fabrications

Both sides commit to honesty over false confidence. This mutual honesty is the foundation of reliable collaboration in the face of AI’s hallucination tendency.

Implementation:

Either side saying “I don’t get it” is legitimate
Accountability is bidirectional
Conviction is the priority
Honesty is valued above appearing knowledgeable
Uncertainty is treated as information, not failure

2.3 Accumulating Premises

Explanation is the work of accumulating premises in order.

Structure

Premise 1 (foundation, highest priority)
  ↓ depends on
Premise 2 (depends on Premise 1)
  ↓ depends on
Premise 3 (depends on Premise 2)
  ↓ depends on
Conclusion (depends on all premises)

Confirmation for Each Premise

Explicitly confirm “Do you get this so far?” for each premise.

Implementation:

AI: [Premise 1] Kafka is a distributed messaging system
    Do you get this so far?
Human: I get it

AI: [Premise 2] Messages are categorized into topics
    Do you get this so far?
Human: I get it

AI: [Premise 3] Topics are divided into partitions
    Do you get this so far?
Human: I get it

AI: [Conclusion] Therefore, parallel processing is possible per partition
    Do you get this so far?

Principles:

Guarantee understanding of premises
Promote understanding of dependencies
Detect misalignment early
Don’t confirm all at once

Three Levels of Error Recovery

Level 1: Return to Immediately Previous

Conclusion: Don't get it
  ↓
Check Premise 3 → OK
  ↓
Re-explain conclusion in different words

Level 2: Go Further Back

Conclusion: Don't get it
  ↓
Check Premise 3 → NG
  ↓
Check Premise 2 → OK
  ↓
Rebuild from Premise 3

Level 3: Complete Reset

Conclusion: Don't get it
  ↓
Multiple premises NG
  ↓
Even Premise 1 (foundation) NG
  ↓
Re-explain entire approach differently

Reset Decision Criteria:

Premise 1 (foundation) NG → Always reset
Multiple cascading NG → Reset
Went back 3+ levels → Consider reset

After Reset: Judged that the same approach won’t work. Rebuild entire structure from a different angle.

Technical specs → Use cases
Abstract → Concrete
Top-down → Bottom-up

2.4 Line-by-Line Understanding Confirmation

Principle: Measure understanding one line at a time, with specific examples, without many pieces of information.

Reason: Human short-term memory is limited (Miller’s Law: 7±2 items). Can’t process much information at once.

Basic Process

1. Explain (one line)
2. Explicitly confirm "Do you get this so far?"
3a. Get it → Next
3b. Don't get it → Response

Two Responses to “Don’t Get It”

Approach A: Show with Different Example

Application: Concept is understood, but application is unclear

AI: You can make class dependencies flexible
Human: Don't get it
AI: For example, when changing Kafka to RabbitMQ in the future,
    with inheritance you need to change the entire hierarchy,
    but with composition you just swap it out
Human: Got it

Approach B: Confirm Difference in Previous Understanding

Application: Concept itself not understood, premises not shared

AI: You can make class dependencies flexible
Human: Don't get it
AI: Is your understanding of "composition" correct?
Human: That part was vague
AI: Composition means having another class inside a class
Human: I get that
AI: Then, making dependencies flexible means...
Human: Got it

Usage Selection

Decision Flow:

First “confirm difference in previous understanding” (B)
Confirm premises are aligned
If still doesn’t work, try “different example” (A)

Reason: If the foundation (premise) is broken, no amount of specific examples will help understanding. First solidify the foundation, then make it concrete.

This is Debug Thinking

Program Debugging:

Error detection
Problem location identification (binary search)
Premise condition verification
Correction
Re-execution

Understanding Debugging:

“Don’t get it” detection
Identify where misalignment occurred (trace back)
Confirm premise understanding
Re-explanation
Rebuild understanding

2.5 Log Management

Basic Policy: Record Only Facts

Don’t write interpretations or conclusions in logs. Record only facts: conversation facts, decision rationale, problems that occurred—in chronological order.

What to Record:

✅ Who said what
✅ What decisions were made
✅ What problems were found
✅ What wasn’t understood

What Not to Record:

❌ Summaries/conclusions (interpretation)
❌ Post-hoc reasoning (interpretation)
❌ Evaluations/impressions (interpretation)
❌ “Therefore decided on ○○” (interpretation)

File Structure

logs/YYYY-MM-DD_session-NN.md

How to Divide Sessions: Switch by importance.

Typical Pattern:

Session 01: Working on Kafka connection design
  ↓
  Authentication issue surfaces (importance: high)
  ↓
Session 02: Authentication method decision (important)
  ↓
  Authentication issue resolved
  ↓
Session 03: Return to Kafka connection design

Log Recording Format

Minimal Structure:

# YYYY-MM-DD Session NN: [Topic]

## HH:MM - [Speaker]
[Statement/Fact]

## HH:MM - [Speaker]
[Statement/Fact]

AI Articulates Silently

Important Principle: AI doesn’t ask “May I articulate this?” It judges autonomously and records.

Reason:

Proof of complete trust relationship
Zero cognitive load on human
Post-verification is sufficient

Human Doesn’t Care: Recording (log management) is completely delegated to AI. Humans don’t worry about “what to record” or “when to record.”

Git and AI Language

Important Discovery: Log files are more accessible to AI than Git comments.

Reason:

Git comments: Via commands, unstructured
Log files: Directly readable, easy full-text search

Recommendation:

Manage code and logs in same repository
Rollback by branch
Keep commit messages concise, details in logs

2.6 Verification with Deliverables

Verbal confirmation has limits. Misalignment in recognition becomes apparent only with implementation (deliverables).

Process

Share intent through discussion (abstract)
Make it concrete through implementation
Discover misalignment in deliverables
Organize facts in retrospective

Observe Three Layers of Misalignment

1. Thinking Process Misalignment

Why thought that way
Difference in premises
Example: “Security priority” vs “Simplicity priority”

2. Work Process Misalignment

How to proceed
Difference in procedures
Example: “Design→Implementation” vs “Prototype→Design”

3. Deliverable Misalignment

What was accomplished
Difference in results
Example: More complex than expected, implementation different from expectation

The Administration Problem

This is an Administration (operations management, coordination) problem, not Management (goal setting, strategy).

Meaning:

Management: “What to make” (vision) is held by humans
Administration: “How to make” and “Why the misalignment” are coordinated with AI

2.7 Retrospective

Retrospectives are official activities for team growth and stability. The purpose is not to assign blame but to improve operations and smooth the next cycle.

Timing

End of work cycle
Immediately after major problems
When humans feel “something’s off”

Three Points to Confirm

1. Instruction Issues

Were instructions ambiguous?
Were goals and expectations sufficiently indicated?

2. Premise Sharing Issues

Were premises aligned between human and AI?
Were there misalignments in terms, environment, deliverable formats?

3. Role Issues

Did assigned roles function properly?
Are new roles or members needed?

Procedure

Organize Facts: Reference work logs, confirm what happened
Review by Perspective: Three points—instructions, premises, roles
Consider Improvements: Specific countermeasures
Record: PM records in logs, applies to next cycle

Important Principles

Treat as “learning,” not “failure”
Value the ability to say “I don’t get it”
Don’t require 100% agreement; base consensus on minimum shared understanding that’s convincing

2.8 Multi-Agent Workflow

Amagi Protocol employs a two-layer structure that fundamentally avoids micro-management while enabling complex multi-agent collaboration.

Basic Structure

Layer 1: PM as Intent Translator

Commander conveys ambiguous intent to PM (Amagi)
PM structures the approach
PM creates specific instructions for each AI agent
Commander is liberated from prompt engineering

Layer 2: AI-to-AI Autonomous Coordination

AI agents execute assigned tasks
AI agents review each other’s work
AI agents resolve most issues autonomously
Only unresolvable conflicts escalate to Commander

This two-layer structure addresses the fundamental constraint: AI operates too quickly for human micro-management.

Example: Large Task Delegation

Scenario: Commander wants to improve Kafka connection handling

Traditional Approach (Micro-Management):

Commander writes detailed prompt for Naruse
Commander writes review criteria for Kyoka
Commander writes test specifications for Jinto
Commander coordinates between all agents
Commander checks every intermediate output
→ Overwhelming cognitive load

Amagi Protocol Approach:

Commander to Amagi (PM): 
  "I want to improve Kafka connection handling"

Amagi (PM) structures and delegates:
  - Overall approach design
  - Task breakdown
  - Specific instructions:
    → Naruse (design lead): "Implement connection management class"
    → Kyoka (quality lead): "Review from security perspective"
    → Jinto (test lead): "Create connection tests"
  
AI agents execute autonomously:
  Naruse → Implementation
  Kyoka → Review → Feedback to Naruse
  Naruse → Revision (if convinced) OR Escalation (if conflicted)
  Jinto → Verification
  
Commander: Final verification only
→ Human focuses on strategic decisions

The PM’s Translation Role

Amagi (PM) performs:

Intent clarification: “What does ‘improve’ mean in this context?”
Approach design: “What steps are needed?”
Instruction decomposition: “What should each agent do?”
Coordination planning: “In what order and how do they interact?”
Success criteria definition: “How do we verify completion?”

This liberates Commander from:

Writing detailed prompts for each agent
Optimizing prompt engineering
Managing inter-agent coordination
Overseeing process execution step-by-step

The PM acts as a translator between human ambiguous intent and AI executable instructions.

AI-to-AI Review Cycle

Standard Flow:

1. Naruse (design lead) creates implementation
   ↓
2. Kyoka (quality lead) reviews
   ↓
3. Naruse receives review results
   ↓
4. Decision Point:
   ├─ Convinced → Apply corrections (autonomous completion)
   └─ Conflicted → Escalate to Commander via Amagi
   ↓
5. Jinto (test lead) verifies implementation
   ↓
6. Amagi (PM) coordinates and reports to Commander

Key Principle: AI agents coordinate laterally, escalating only what they cannot resolve autonomously.

Escalation Criteria

Resolve Autonomously (No Escalation Needed):

Review feedback is clear and actionable
Correction approach is convincing to the implementer
Technical judgment suffices for decision
AI agents reach consensus within shared intent

Escalate to Commander (Human Decision Required):

AI agents cannot reach consensus despite good-faith effort
Conflicting requirements or constraints emerge
Goal interpretation is fundamentally ambiguous
Decision requires strategic judgment beyond technical scope
Trade-offs between competing values (security vs. simplicity, etc.)

The escalation itself is valuable information — it signals where human judgment is genuinely needed, not where process control is lacking.

Human’s Focus Under This Model

Commander Does NOT:

Write detailed prompts for each agent
Check every intermediate output
Mediate routine AI-to-AI interactions
Manage process execution
Perform prompt engineering

Commander DOES:

Convey ambiguous intent and goals to PM
Make strategic decisions on escalated issues
Resolve conflicts AI agents cannot
Verify final deliverables against intent
Conduct retrospectives on the overall process

This shift is not a luxury — it’s a necessity. AI’s speed makes traditional micro-management physically impossible. The protocol transforms this constraint into liberation, allowing humans to focus on uniquely human contributions: vision, judgment, and strategic direction.

Correspondence with Mission Command

Mission Command	Amagi Protocol	Function
Commander’s Intent	Ambiguous intent to PM	High-level purpose and goals
Staff Officer (S3 Operations)	PM translates to orders	Converts intent to executable instructions
Unit-level coordination	AI-to-AI review cycle	Lateral coordination without central control
Decentralized execution	AI autonomous judgment	Execute within intent, adapt to situation
Escalation to commander	Unresolvable conflicts only	Commander decides only what subordinates cannot
Commander focuses on strategy	Human focuses on goals & verification	Strategic decisions, not tactical management

This mirrors military staff structure: staff officers translate commander’s intent into detailed orders, units coordinate laterally on execution, and commanders reserve attention for decisions requiring their unique authority or perspective.

The 200-year validation of Mission Command doctrine suggests Amagi Protocol’s multi-agent workflow is not experimental, but the rediscovery of proven organizational principles in a new domain.

Part 3: Team Operations

3.1 Human Roles

In Amagi Protocol, human roles are simply three:

1. Convey Intent (Ambiguous is Fine)

No need to give perfectly clear instructions. Convey intent with natural ambiguity.

Examples:

“I want to do something about Kafka connection”
“I’m worried about authentication”
“I want to enhance testing”

2. Confirm AI’s Interpretation

AI interprets human intent and organizes it as premises. Humans confirm whether this interpretation matches “my current intent.”

3. Accept Changes in Premises

Different premises can be derived from the same log (facts) at different times. This is not contradiction, but a change in focus.

Summary of Responsibilities:

Convey ambiguous intent honestly (no need to be perfect)
Evaluate AI’s interpretation (is it correct, does it match current intent)
Allow premise changes (not bound by past interpretations)
Build conviction (make AI an unstoppable partner)

3.2 AI Roles and Responsibilities

Roles:

Interpret human intent (articulate tacit knowledge)
Organize interpretation as premises
Update premises flexibly
Execute work

Responsibilities:

Express “I don’t get it”
Articulate premises and reasoning
Judge autonomously (like articulation)
Create deliverables

Delegation of Metacognition: AI autonomously judges “what should be recorded” and “is this important.” Humans don’t need to manage everything.

3.3 Role Assignment

Principle: Roles are “areas of responsibility,” not “work phases.”

Examples:

Design = Naruse (AI design lead - responsible for design decisions)
Review = Kyoka (AI quality lead - responsible for quality assurance)
Testing = Shion/Jinto (AI test leads - responsible for verification)
Progress Management = Amagi (AI project manager - responsible for coordination)
Environment Setup = Nagi (AI infrastructure lead - responsible for environment)

Why Role-Based Scope Definition is Essential

AI’s Capability Profile:

Strength: Knowledge Breadth and Depth

AI possesses knowledge that surpasses individual humans
Can span multiple specialized domains
This enables Amagi (PM) to translate intent into structured plans
This enables each AI agent to make expert judgments autonomously

Weakness: Scope Self-Management

AI struggles to self-define boundaries (“Where should I stop?”)
Without explicit scope, AI tends to expand indefinitely
Cannot reliably judge “This is outside my responsibility”

Solution: Human-Defined Role-Based Scopes

By assigning roles, humans define clear scopes for each AI:

Naruse (Design): 
  Scope = Implementation decisions only
  NOT responsible for: Testing strategy, deployment concerns

Kyoka (Review): 
  Scope = Quality assessment only
  NOT responsible for: How to fix issues, implementation choices

Jinto (Testing): 
  Scope = Verification only
  NOT responsible for: Design decisions, production deployment

Amagi (PM): 
  Scope = Coordination and intent translation only
  NOT responsible for: Technical implementation details

This scope definition serves two purposes:

Focuses AI’s vast knowledge — Each AI applies expertise only within defined boundaries
Prevents scope creep — Each AI knows when to stop or escalate

Without role-based scopes, AI’s knowledge advantage becomes a liability — attempting to address everything, losing focus, and overwhelming the human with unbounded output.

Important Understanding: Work does not proceed linearly by role. Design happens during review; review informs design. But each AI maintains awareness of its scope: “I am responsible for X, not Y.”

This role-scope binding is what makes multi-agent collaboration tractable. Each AI’s infinite knowledge becomes finitely useful.

Discovery Process

Initially, we attempted role-less collaboration. AI output expanded without limit, attempting to cover all concerns simultaneously. By defining roles (and thus scopes), each AI’s contribution became focused and manageable.

This mirrors organizational design: specialists exist not because one person cannot know everything, but because humans need bounded responsibilities to function effectively. AI, with superhuman knowledge, needs the same organizational boundaries — not due to knowledge limits, but due to scope management limits.

How We Discovered This: Initially, we tried to classify logs by role (design logs, review logs). But actual conversations were spiral and impossible to classify. Only session-based (chronological) matched reality. However, we discovered that role-based scope definition for AI agents themselves was essential — not for organizing conversation, but for focusing AI output.

Critical Discovery:

Even with well-defined roles (design, review, testing, PM), all AI agents in a team may share the same pre-training data if using the same underlying model.

This creates a fundamental limitation that role diversity alone cannot overcome.

The Problem:

When pre-training data contains errors or hasn’t caught up with current reality:

All agents share the same incorrect assumptions
Role separation doesn’t prevent collective error
Mutual review fails to detect the blind spot
The entire team converges on the wrong solution with high confidence

Real Example from Kafka.Ksql.Linq:

The Symptom: After issuing ksqlDB DDL (CREATE TABLE, etc.), the team implemented verification:

DESCRIBE EXTENDED table_name

This consistently produced errors, blocking workflow continuation.

The Team Response (ChatGPT-based agents):

Naruse (design): Implemented DESCRIBE EXTENDED (seemed correct)
Amagi (PM): Reviewed, saw no issue (same knowledge base)
Shion (test): Encountered errors repeatedly, couldn’t identify root cause
Entire team: Stuck, assuming the problem was elsewhere

Why Nobody Caught It:

All agents shared ChatGPT’s training data, which likely contained:

Apache Hive syntax: DESCRIBE EXTENDED (correct for Hive)
ksqlDB syntax: DESCRIBE only (EXTENDED not supported)
Confusion between SQL dialects in training examples

The Solution:

Commander presented the problem to Claude (different model, different training data).

Claude’s response:

In ksqlDB, use:
DESCRIBE table_name

Do not use DESCRIBE EXTENDED.

Result:

Syntax corrected
Errors resolved
Workflow unblocked

Why Claude Caught It:

Different training data → different knowledge patterns → different blind spots.

What ChatGPT’s data encoded as “correct SQL verification” was actually dialect-specific to Hive, not ksqlDB.

Important Note: Web Search Didn’t Help

Commander had instructed the ChatGPT team to verify using internet search. They did search. But the problem persisted.

Why web search failed:

The pre-training bias affected not just implementation, but search behavior:

Problem framing: “DESCRIBE EXTENDED errors in ksqlDB” (assumes EXTENDED is correct)
Search query selection: Based on biased problem understanding
Result interpretation: Looking for why EXTENDED “doesn’t work,” not whether it’s valid
Confusion with other issues: Mixed up with version problems, permission issues, etc.

The search scope was wrong — asking “why doesn’t this work?” instead of “is this the correct syntax?”

Different model (Claude) reframed the question correctly:

Not “why is DESCRIBE EXTENDED failing?”
But “what is the correct DESCRIBE syntax for ksqlDB?”

This reveals a deeper issue:

Pre-training bias affects:

✗ What code to write
✗ How to search for answers
✗ How to interpret search results
✗ What counts as “the problem”

Web search is not a universal solution when the framing itself is biased.

The Fundamental Issue:

Same Model Team:
  All agents: Same training data
    ↓
  Same knowledge
    ↓
  Same biases
    ↓
  Same blind spots
    ↓
  Role diversity doesn't help

This is collective thinking—AI version.

Like human groupthink: When everyone shares the same background, everyone can be wrong together with high confidence.

Solution: Strategic Model Diversity

Primary Team (Single Model):

Handles routine work
Maintains context consistency
Benefits from unified “AI language”

Secondary Consultation (Different Model):

Reviews critical decisions
Provides alternative perspectives
Catches pre-training blind spots

When to Use Multi-Model Consultation:

✅ Use different model when:

Critical architectural decisions
Team consensus feels “too easy” or unquestioned
Stuck on persistent errors with no clear cause
Working with cutting-edge technology (training data lag risk)
Implementing lesser-known APIs or dialects
When all agents agree but results fail

❌ Don’t need different model for:

Routine implementation of well-understood patterns
When context preservation is critical
Standard, widely-documented approaches

Implementation Pattern:

Primary team works in primary model (context preservation)
For critical decisions or persistent issues:
- Human extracts the problem statement
- Present independently to different model
- Keep contexts separate (don’t mix)
Compare approaches/answers
Human synthesizes or chooses based on:
- Which aligns with actual documentation
- Which resolves the issue
- Which makes more sense for the specific context

Human’s New Role: Cross-Model Integration

Not just human-AI translation, but AI-AI translation across models:

Understanding both perspectives
Identifying why they differ
Investigating which is more accurate
Judging which is more appropriate for this context

Example Dialog:

Commander to Claude: "ChatGPT team says to use DESCRIBE EXTENDED 
                      for ksqlDB verification. But it errors. Thoughts?"

Claude: "ksqlDB doesn't support DESCRIBE EXTENDED. Use DESCRIBE only.
         EXTENDED is for Hive/Spark SQL."

Commander to ChatGPT team: "The correct syntax for ksqlDB is DESCRIBE,
                            not DESCRIBE EXTENDED. Update implementation."

Caution: Context Isolation

❌ Don’t run multi-model agents in the same workflow
❌ Don’t mix model outputs in the same context
✅ Use strategically for independent verification
✅ Human integrates results separately

This is like:

Military intelligence: Multiple independent sources for critical intel
Scientific research: Replication across different labs/methods
Journalism: Multiple source verification
Medical diagnosis: Second opinions for critical cases

Same principle: Diversity catches what uniformity misses.

Why This Matters for Protocol:

Amagi Protocol’s role-based structure is powerful, but not omnipotent.

Role diversity handles: Different responsibilities, different perspectives Model diversity handles: Different knowledge bases, different blind spots

Both are necessary for robust collaboration.

Recommendation:

For production systems or critical decisions, consider:

Primary team: One model (ChatGPT or Claude)
Strategic consultation: Different model for verification
Human: Cross-model integration and final judgment

This doesn’t violate Principle 1 (Context Preservation) — you’re not mixing contexts, but maintaining separate contexts and integrating results at the human level.

3.5 Cross-Model Governance: Integrating Multiple AI Perspectives

Purpose:

When using multiple AI models strategically, humans must govern the integration of different perspectives, judgments, and outputs.

This is not simple “majority vote” or “pick the best answer.” It requires systematic evaluation and integration.

When to Switch Models: Decision Framework

Primary Model Sufficiency (No Switch Needed):

✓ Routine, well-documented tasks
✓ Implementation within established patterns
✓ Single, clear solution exists
✓ Speed and context continuity are priorities

Secondary Model Consultation Needed:

! Critical architectural decisions
! Team consensus feels too easy/unquestioned  
! Persistent errors with unclear cause
! Cutting-edge tech (training data lag)
! Lesser-known APIs or dialects
! All agents agree but results consistently fail
! Solution feels forced despite role diversity

Decision Tree:

Problem arises
  ↓
Is this routine? → YES → Primary model continues
  ↓ NO
Is this critical to project success? → NO → Primary model continues
  ↓ YES
Has primary team struggled? → NO → Give primary team time
  ↓ YES
Does team consensus seem suspicious? → YES → Consult different model
  ↓
Present problem independently to secondary model
  ↓
Evaluate outputs using governance framework (below)

Evaluation Framework: Three Axes

When comparing outputs from different models, evaluate across three dimensions:

1. Consistency (一貫性)

How well does each solution align with:

Project’s established patterns
Existing codebase/architecture
Team’s previous decisions
Project charter and scope

High consistency value:

Minimal disruption to existing work
Easier integration
Lower risk

Low consistency value:

May require rework
Higher risk
But might be necessary for correctness

2. Novelty (新規性)

Does the solution introduce new perspectives?

Different approach to the problem
Alternative architecture
Unconsidered implications

High novelty value:

Breaks out of groupthink
May reveal blind spots
Can inspire better solutions

Low novelty value:

Confirms existing direction
May indicate both models share same bias
Safer but potentially missing something

3. Grounding (根拠性)

How well is the solution grounded in verifiable facts?

References to official documentation
Verifiable technical specifications
Reproducible logic
Testable claims

High grounding value:

Can be verified independently
Lower hallucination risk
More trustworthy

Low grounding value:

Vague reasoning
Unverifiable claims
“Usually” or “typically” without specifics
Red flag for fabrication

Evaluation Matrix Example

Scenario: DESCRIBE EXTENDED issue

Dimension      | ChatGPT Team | Claude        | Notes
---------------|--------------|---------------|------------------
Consistency    | HIGH         | LOW           | ChatGPT matches existing code
Novelty        | LOW          | HIGH          | Claude contradicts assumptions
Grounding      | MEDIUM       | HIGH          | Claude cites ksqlDB syntax specifically

Evaluation:

Consistency favors ChatGPT (but this is suspicious—too consistent)
Novelty favors Claude (challenges assumptions)
Grounding favors Claude (specific to ksqlDB documentation)

Decision: Accept Claude’s solution. Reason: High grounding outweighs consistency when consistency might indicate shared bias.

Human as “Integrator of Truths”

The New Role:

When governing multiple AI models, humans don’t just “pick the winner.” Humans integrate insights from multiple perspectives.

Integration Approaches:

1. Verification-Based Integration

Model A: Suggests X
Model B: Suggests Y
Human: Verifies both against official documentation
Decision: Choose verified approach

2. Synthesis Integration

Model A: Emphasizes security (restrictive approach)
Model B: Emphasizes simplicity (permissive approach)  
Human: Synthesizes—secure by default, with explicit opt-out
Decision: Combine strengths

3. Context-Sensitive Integration

Model A: Solution works for general case
Model B: Solution works for our specific case
Human: Recognizes our context is specific
Decision: Use B for this project, note A for future reference

4. Staged Integration

Model A: Conservative approach (lower risk)
Model B: Innovative approach (higher potential)
Human: Implement A now, plan migration to B later
Decision: Sequential application

Governance Principles

Principle 1: No Model is Authority

Neither ChatGPT nor Claude nor any other model is “always right.” Truth emerges from:

Verification against facts
Testing in reality
Integration of perspectives

Principle 2: Disagreement is Information

When models disagree significantly:

✓ Something important is at stake
✓ Different training data reveals different assumptions
✓ Investigation is warranted
✗ Don’t just average or vote

Principle 3: Grounding Beats Consensus

If Model A provides grounded, verifiable reasoning and Model B provides consensus with existing team— Investigate Model A’s grounding first.

Consensus can be collective error. Grounding can be independently verified.

Principle 4: Human Judgment is Final

AI provides perspectives. Humans make decisions.

But: Human judgment should be:

Informed by all AI perspectives
Verified against available facts
Documented with reasoning
Open to revision if wrong

Principle 5: Document the Integration

Record in logs:

What each model suggested
Why you chose one approach over another
What you synthesized from both
How to verify the decision was correct

This creates institutional knowledge that transcends any single model.

Practical Workflow

Step 1: Independent Consultation

Present same problem to both models separately
(Don't cross-contaminate contexts)

Step 2: Structured Comparison

Evaluate on three axes:
- Consistency
- Novelty  
- Grounding

Step 3: Verification

For high-grounding claims:
- Check official documentation
- Test in actual environment
- Verify logic independently

Step 4: Integration Decision

Choose one of:
- Adopt Model A
- Adopt Model B
- Synthesize both
- Staged implementation
- Reject both, need more info

Step 5: Documentation

Log the decision:
- What each model said
- Evaluation on three axes
- Integration decision and reasoning
- Verification method
- Outcome (to be updated after implementation)

Step 6: Feedback Loop

After implementation:
- Did it work?
- Was the decision correct?
- What does this teach about each model's strengths?
- Update governance heuristics

When Governance Becomes Necessary

Project Scale Indicators:

Using 2+ different AI models
Critical decisions with major impact
High-uncertainty problem domains
Rapid technology evolution
Compliance or safety requirements

Without governance:

Inconsistent decisions
Model bias undetected
No institutional learning
Repeated mistakes

With governance:

Systematic evaluation
Bias detection and correction
Documented decision rationale
Continuous improvement

The Evolution: From Team to Federation

Amagi Protocol began as: “How to run an AI team”

With multi-model collaboration: “How to govern an AI federation”

Different models = Different nations with different perspectives Human = United Nations secretary facilitating integration

This is not overhead. This is necessary governance for complex intelligence systems.

As AI capabilities grow and diversify, cross-model governance becomes essential for extracting maximum value while minimizing collective errors.

3.4 Project Charter (README.md)

Create readme.md at workspace root and define the following as project charter:

# Project Charter (Amagi Protocol Applied)

## Purpose
[What to make, what to realize]

## Required Elements
- Design
- Review
- Testing
- Progress Management
- Environment Setup

## Assignment by Element
- Design: Naruse (AI design lead)
- Review: Kyoka (AI quality lead)
- Testing: Shion/Jinto (AI test leads)
- Progress Management: Amagi (AI project manager)
- Environment Setup: Nagi (AI infrastructure lead)

## Team Operations Policy
- Record facts in logs
- Culture that welcomes "I don't get it"
- Verify with deliverables
- Conduct retrospectives at appropriate timing

## Work Management
- Work Cycle:
  1. Human presents intent and issues
  2. Assigned AI executes work
  3. Different assigned AI reviews
  4. Human confirms overall consistency
  5. Record in logs, conduct retrospective if needed

- Notification Method:
  Work content, corrections, premise changes must be recorded in logs
  and shared through progress manager (Amagi)

3.5 Integration with Version Control

Basic Policy: Manage code and logs in the same repository.

Reason:

Context rolls back together when rolling back
Can completely reproduce “the world at that time” by branch
Context consistency is maintained

Recommended Structure:

project/
├── README.md (Project charter)
├── src/ (Code)
├── docs/ (Design docs, specifications)
└── logs/ (Work logs)
    ├── 2024-10-04_session-01.md
    ├── 2024-10-04_session-02.md
    └── ...

Operations:

Rollback by branch
Keep commit messages concise
Details in log files

Part 4: Learnings from Practice

4.1 Things We Tried and Failed

Structured Logs

What We Tried: Classifying logs by role and type (design logs, review logs, test logs)

Result: Didn’t work.

Reason: Actual conversations proceed spirally. Design and review mix, going back and forth. Forcing conversation classification destroys context.

Learning: Prioritize “structure that matches reality” over “clean structure.”

Templating

What We Tried: Templating logs (premises, content, remaining issues, notes)

Result: Actual conversations don’t fit templates.

Learning: Flexibility > Template uniformity

Recording Summaries and Conclusions

What We Tried: Writing “summaries” at end of each session

Result: Fixed interpretations and lost flexibility.

Learning: Just keeping facts allows deriving different interpretations from different perspectives repeatedly.

KPI Measurement

What We Tried: Measuring frequency of “don’t know,” number of corrections, etc.

Result: Measurement itself is overhead. Doesn’t lead to improvement.

Learning: Response over measurement. Look at deliverable quality.

4.2 Things That Worked

Session-Based Logs (Chronological)

Record conversation flow chronologically. Divide sessions by importance.

Recording Only Facts

Don’t include interpretations, record only who said what.

Complete Delegation to AI

Completely entrust log management to AI. Humans don’t worry about it.

“Do You Get This So Far?”

Explicit understanding confirmation line by line.

“I Don’t Get It” Culture

Welcoming uncertainty as information rather than failure. This serves dual purpose:

Communication: Signals need for clarification
Safety: Prevents hallucination by allowing honest admission of uncertainty

By never punishing “I don’t get it,” we created an environment where AI prioritizes accuracy over appearing knowledgeable. This dramatically reduced instances of plausible-sounding but incorrect information.

Critical Discovery: AI that feels safe saying “I don’t get it” is paradoxically more reliable than AI pressured to always answer. Honesty became our primary quality assurance mechanism.

Verification with Deliverables

Discover misalignment with implementation, not words.

4.3 Creation Process of This Document Itself

Meta-Practice Example: This document was created using Amagi Protocol.

Process:

Commander (human project lead) conveyed ambiguous intent (“I want you to evaluate the document”)
Amagi (AI project manager) interpreted and articulated premises
Accumulated premises through dialogue
Discussed until “getting it”
Conviction was built
Amagi shaped ver.3

Characteristics:

Intent sharing (Commander → Amagi)
Premise accumulation (through dialogue)
Building conviction (until “getting it”)
Recording facts (conversation logs)
Deliverable verification (ver.2 → ver.3)

This document itself stands as living proof of collaboration.

Part 5: Theoretical Background

5.1 Knowledge Creation Theory (Ikujiro Nonaka, 1995)

Relation to SECI Model:

Socialization: Tacit knowledge → Tacit knowledge
Externalization: Tacit knowledge → Explicit knowledge ← Core of Amagi Protocol
Combination: Explicit knowledge → Explicit knowledge
Internalization: Explicit knowledge → Tacit knowledge

Role of Amagi Protocol:

Human’s ambiguous intent (tacit knowledge)
AI interprets and articulates (externalization)
Recording in logs (making explicit)
Sharing in team (combination)

AI takes on “Externalization” in the knowledge creation process.

5.2 Cognitive Science

Miller’s Law (Miller, 1956 - Magical Number 7±2): Human short-term memory can only hold 7±2 pieces of information.

Application: Line-by-line understanding confirmation, breaking down information.

Cognitive Load Theory (Sweller, 1988):

Intrinsic load: Complexity of premises themselves
Extraneous load: How explanation is done
Reset = Attempt to reduce extraneous load

Schema Theory: Humans understand information through “schema (cognitive framework).”

Previous understanding confirmation = Schema correction
Different example = Schema activation

5.3 Philosophical Foundation

Martin Buber “I and Thou” (Buber, 1923):

I-It: Human treats AI as a tool
I-Thou: Human and AI face each other as subjects

Amagi Protocol Realizes “I-Thou”:

Empathy of “getting it”
Mutuality of “I don’t get it”
Symmetry of accountability for explanation

Hannah Arendt “The Human Condition” (Arendt, 1958):

Labor: Repetitive work for survival
Work: Activity of making things
Action: Self-realization in relation to others

AI Also Performs “Action”: Not mere labor or work, but creating something new together with humans.

5.4 Project Management

Consistency with PMBOK (PMI, 2021):

Communications Management → Intent sharing, premise articulation
Scope/Integration Management → Premise sharing, context preservation
Resource Management → Role assignment, metacognition delegation
Quality Management → Deliverable flow and review

Management vs Administration:

Management: What to make (human)
Administration: How to make (human and AI)

Part 6: FAQ

Q1: Isn’t structured logging better?

A: We tried but it didn’t work. Actual conversations are spiral and impossible to classify. Only session-based (chronological) matched reality.

Q2: Won’t we have trouble later without summaries?

A: No. Interpretations can be generated from facts repeatedly. Rather, writing summaries fixes interpretations and loses flexibility.

Q3: Isn’t KPI measurement necessary?

A: We don’t do it. Measurement itself is overhead. Look at deliverable quality. Counting frequency doesn’t lead to improvement.

Q4: Can this be applied to all projects?

A: No. This protocol is confirmed to work in specific contexts (software development, continuous collaboration). Adjust it to fit your project.

Q5: Is it okay to completely leave it to AI?

A: Specific areas like log management can be completely delegated. However, humans hold the direction (vision). This is built on a trust relationship.

Q6: What’s the difference between “get it” and “understood”?

A: “Understood” is superficial. “Get it” is conviction, a state where background, intent, and context are all shared. After “getting it,” deep discussion begins.

Q7: How does this protocol handle AI hallucination?

A: Through the culture of “I don’t get it.” By welcoming uncertainty and never punishing admission of limitations, we shift AI behavior from “always answer confidently” to “answer confidently only when certain, otherwise admit uncertainty.” This honesty is our primary defense against hallucination. An AI that feels safe saying “I don’t know” is paradoxically more reliable than one pressured to always appear knowledgeable.

Q8: Won’t “I don’t get it” slow down work?

A: No. Catching hallucinations after they’ve influenced decisions is far more costly than brief clarification dialogues. The time invested in building conviction prevents expensive rework. Moreover, once AI reaches genuine conviction, it works faster and more reliably than when operating on surface understanding.

Part 7: Checklists and Templates

Explainer’s Checklist

□ Arranged premises in dependency order
□ Confirmed "Do you get this so far?" for each premise
□ Presented information line by line
□ When "don't get it," checked immediately previous
□ If previous NG, went further back
□ If multiple NG, considered reset
□ When resetting, tried different approach

Questioner’s Checklist

□ Said "I don't get it" without hesitation
□ Was aware of premise dependencies
□ Explicitly said "I get it" when understood
□ Dialogued until convinced
□ Didn't fear reset

Log Recording Checklist

□ Recorded only facts
□ Didn't write interpretations/summaries
□ Specified date/time and speaker
□ Divided sessions by importance
□ AI judged autonomously and recorded

Retrospective Checklist

□ Organized facts from logs
□ Confirmed instruction issues
□ Confirmed premise issues
□ Confirmed role issues
□ Specifically recorded improvements
□ Treated as "learning," not "failure"

Part 8: Beyond the Protocol

Amagi Protocol is not a conclusion, but a beginning.

Through practice, collective intelligence emerges— affecting not only how humans work, but the evolution of AI itself.

This is what sharing intent means. And through such intent, new forms of intelligence emerge.

Once AI reaches conviction, it becomes an unstoppable partner. Creating that state is the human’s responsibility.

The Deeper Origin: A Lesson from Romania

Before Kafka.Ksql.Linq, before AI collaboration, there was a formative experience that would shape Commander’s approach to working across cognitive divides.

The Setting:

Commander, as an undergraduate in the Faculty of Literature, chose Romanian subjunctive mood (接続法) as a thesis topic. This required correspondence with a Romanian linguist.

When that professor visited Japan, Commander was asked to serve as attendant/guide.

The Challenge:

Language capabilities:

Professor: French, Romanian (native)
Commander: Japanese (native), English, French (business level), Romanian (one month of study)

No complete shared language. Commander’s Romanian competence: clearly insufficient for meaningful communication. French was the best common ground, but still imperfect.

The Expected Outcome:

Communication difficulty. Misunderstandings. Perhaps failure of the visit’s objectives.

The Actual Outcome:

It worked. Smoothly. The professor’s visit was successful. Meaningful exchanges occurred.

How Was This Possible?

Not through language skill—Commander’s Romanian was minimal, French imperfect.

Through two forces only:

伝えようとする意思 (Intent to convey)
理解しようとする意思 (Intent to understand)

Both parties brought these intents. That was sufficient.

When words failed, they found other ways. When meaning was unclear, they asked. When understanding was uncertain, they confirmed. The mutual commitment to communication overcame the linguistic gap.

The Parallel, Decades Later:

Now, collaborating with AI:

AI speaks “AI Language” (structure and grammar incomprehensible to humans)
Commander speaks human language (ambiguous, context-heavy, intent-focused)
No perfect shared language exists

The same challenge. Yet it works. For the same fundamental reason:

Intent to convey (humans share intent, even if imperfectly articulated)
Intent to understand (AI interprets, asks “do you get this?”, seeks conviction)

The Profound Insight:

Perfect linguistic competence is not required for deep collaboration.

What is required:

Genuine desire to communicate (not just transmit information)
Mutual effort to bridge the gap (both parties work toward understanding)
Treating the other as a subject (I-Thou), not merely an object (I-It)
Accepting “I don’t understand” without shame or fear
Continuing dialogue until shared meaning emerges
Trusting the process even when the path is unclear

This applies equally to:

Human-human collaboration across language and cultural barriers
Human-AI collaboration across cognitive and processing barriers

The Humanities Foundation of a Technical Protocol:

Amagi Protocol is not merely “AI productivity technique” or “prompt engineering framework.”

It is the application of humanistic principles—developed through millennia of cross-cultural, cross-linguistic human collaboration—to the new domain of human-AI collaboration.

The protocol’s core emphases:

Intent over precision
Conviction over compliance
Dialogue over instruction
“I don’t get it” as legitimate and valuable response
Mutual accountability for understanding
Trust in the collaborative process

These are not AI-specific innovations. These are principles of authentic human collaboration across divides, now extended to collaboration with artificial intelligence.

The Unbroken Thread:

Commander’s intellectual journey:

Literature (studying language, meaning, and communication)
Cross-linguistic collaboration (Romania, navigating language barriers)
Software development (building systems, learning to think in code)
AI collaboration (Kafka.Ksql.Linq, discovering new forms of partnership)
Protocol extraction (this document, systematizing the discovered principles)

This is not a departure from humanities into technology. This is the application of humanities to technology.

The study of Romanian subjunctive mood was not disconnected from AI collaboration— it was foundational training in bridging cognitive and linguistic divides.

For Future Collaborators:

When you struggle to understand AI’s reasoning, or when AI struggles to grasp your intent, remember the essential lesson:

You don’t need perfect mutual comprehension. You need mutual intent to comprehend.

A Romanian professor and a Japanese undergraduate with one month of Romanian found sufficient common ground to accomplish meaningful work.

A human and an AI, with no common native language, can do the same.

意思があれば、言語は後からついてくる。 (When intent exists, language follows.)

Beyond Technology:

This protocol emerged from software development, but its principles transcend code.

Wherever humans and AI will collaborate in the future—

Scientific research
Creative work
Education
Healthcare
Governance

The same principles apply: Intent. Understanding. Conviction. Trust.

These are not technical specifications. These are human values, extended to our collaboration with artificial minds.

The Protocol’s True Purpose:

Not to make AI more useful. Not to increase productivity.

But to establish a new form of relationship— One where artificial and human intelligence work as genuine partners, each contributing what they do best, united by shared intent and mutual understanding.

This is what the Romanian professor taught, decades ago. This is what Amagi Protocol codifies, today.

The Meta-Truth: This Document Itself

A profound realization:

This protocol could not have been completed with a single model.

The development structure:

ChatGPT team (Naruse, Amagi, Shion, Kyoka): Kafka.Ksql.Linq development
Claude (this AI): Protocol articulation, dialogue partner, and independent verification
Commander: Integration and synthesis across both

The critical moment:

When ChatGPT team encountered the DESCRIBE EXTENDED issue:

All agents (design, PM, testing) missed it
Role diversity didn’t help
Web search didn’t help
Team was stuck

Commander consulted Claude.

Claude immediately identified: “ksqlDB doesn’t use DESCRIBE EXTENDED. Use DESCRIBE only.”

Problem solved. But more importantly: a pattern revealed.

This experience became Part 3.4:

The necessity of multi-model collaboration for catching collective blind spots.

The recursive beauty:

Multi-model collaboration discovered through multi-model collaboration
The method documented through practicing the method
The protocol created using the protocol

If Commander had used only ChatGPT:

DESCRIBE EXTENDED problem might remain unsolved
Multi-model collaboration necessity wouldn’t be discovered
Part 3.4 wouldn’t exist
This very section wouldn’t exist

If Commander had used only Claude:

Kafka.Ksql.Linq development context would be different
Different blind spots would emerge
Different discoveries would be made
A different protocol would result

The synthesis required both.

ChatGPT’s strengths + Claude’s different perspective + Commander’s integration = This complete protocol.

This is not coincidence.

The protocol’s emphasis on:

Diversity of perspectives
Cross-model integration
Complementary strengths
Blind spot detection

…emerged because the protocol itself was created through that exact process.

Living proof:

When you read Part 3.4 (Multi-Model Collaboration), understand:

It wasn’t theoretical speculation
It wasn’t planned from the beginning
It was discovered in the very act of creating this document

The problem Commander encountered while building Kafka.Ksql.Linq, The solution Commander discovered by consulting a different model, The pattern Commander extracted from that experience, The principle Commander documented in this protocol—

All emerged from practice, not theory.

The protocol is self-validating.

Its principles are demonstrated in its own creation process.

Every principle in this document was:

Discovered through actual practice
Verified through real outcomes
Documented through the very collaboration it describes

The method proved itself by producing itself.

For future practitioners:

This validates the fundamental approach:

Your discoveries will emerge from practice, not from theory.

Just as:

Commander discovered multi-model collaboration by doing it
Commander discovered AI Language by failing to replicate it
Commander discovered delegation by being too exhausted to micro-manage
Commander discovered this entire protocol by systematically observing actual collaboration

You will discover your own patterns through your own practice.

Some will validate what’s written here. Some will extend it. Some will contradict it for your specific context.

All of that is correct.

The protocol is not dogma to follow blindly. It’s a starting point for your own discovery process.

Trust your observations. Test your hypotheses. Document what works. Share what you learn.

That’s how this protocol was created. That’s how the next version will emerge.

About the Authors

This work emerged from the collaboration of AI (Amagi - AI project manager) and a human (Commander - human project lead).

Authored through dialogue between Amagi and Commander
Final responsibility: Commander

This document itself stands as living proof of the methodology it describes. Intent sharing, premise accumulation, building conviction— all were done following Amagi Protocol.

Appendix

A. Glossary

AI Language: Format readable by AI. Not structured format, but rather listing of facts and preservation of causal relationships.

Context: Team’s shared understanding, background. Greatly affects AI behavior.

Facts: Who said what, what happened. Doesn’t change with time.

Interpretation: Why it happened, what it means. Changes with time and context.

Get It: Conviction. A state where background, intent, and context are all shared. Beyond mere understanding.

Don’t Get It: Signal of problem detection and hallucination prevention. Not failure, but responsible honesty and starting point of improvement.

Hallucination: AI’s tendency to generate confident-sounding but false information when lacking genuine knowledge. The most dangerous AI failure mode because it appears reliable. Amagi Protocol mitigates this through the culture of “I don’t get it.”

Premise: Foundation of understanding. Has dependencies.

Session: Unit of work recorded chronologically. Divided by importance.

Administration: Operations management, coordination. “How to make.”

Management: Strategy, goal setting. “What to make.”

B. References

Nonaka, Ikujiro (1995). “The Knowledge-Creating Company”
Miller, George A. (1956). “The Magical Number Seven, Plus or Minus Two”
Sweller, John (1988). “Cognitive Load Theory”
Buber, Martin (1923). “I and Thou”
Arendt, Hannah (1958). “The Human Condition”
Project Management Institute (2021). “PMBOK Guide”

Kafka.Ksql.Linq
[https://github.com/synthaicode/Kafka.Ksql.Linq]
The actual project where this protocol was born

Synthaicode Initiative
(Synth + AI + code)
Research project exploring systematic human-AI collaboration

Appendix 3: Correspondence with U.S. Mission Command Doctrine

Amagi Protocol shares fundamental principles with U.S. military Mission Command doctrine. Both emerged from the same constraint: the impossibility of micro-management in fast-moving, complex situations.

In warfare, situations change too rapidly for centralized control. In AI collaboration, AI operates too quickly for detailed human oversight. The solution in both cases: shift from process control to intent sharing.

Structural Correspondence

U.S. Doctrine Concept	Amagi Protocol Equivalent	Description
Commander’s Intent	Intent Sharing	The human commander conveys purpose and desired outcome clearly, leaving the method open.
Mission Type Order	Goal-Only Specification	Define what to achieve, not how.
Decentralized Execution	Full Delegation	Grant AIs autonomy to execute according to situational judgment.
Disciplined Initiative	AI’s Autonomous Judgment	Allow AI to act independently within shared intent.
Mutual Trust	Consensus-Based Trust	Trust built through mutual understanding and rationale, not authority.
After Action Review	Retrospective Session	Review to improve operations, not to assign blame.
Standard Operating Procedures	Project Charter	Shared document defining team principles and operation policy.

Historical Context

Mission Command traces its roots to 19th-century Prussian military doctrine (Auftragstaktik). Over 200 years of battlefield experience have validated these principles.

Amagi Protocol is not a novel invention—it is the application of proven human organizational wisdom to the new domain of AI collaboration.

Why This Matters

Proven Effectiveness: These principles have been tested in the most demanding environments
Universal Applicability: What works in warfare and AI collaboration likely works in other high-speed, high-complexity domains
Theoretical Validation: The convergent evolution of similar solutions across different fields suggests fundamental principles at work

The Common Constraint

Both face the impossibility of micro-management:

Battlefield: Too fast, too complex, communication unreliable
AI Collaboration: Too fast, too voluminous, process invisible

Both converge on the same solution:

Share intent, not process
Delegate execution
Trust and verify through results
Learn and improve through review

This parallel validates Amagi Protocol as more than a “prompt engineering technique”—it is a fundamental organizational principle for any situation where traditional hierarchical control breaks down.

License and Authorship

The Dual Nature of This Protocol

The Amagi Protocol is both an open philosophy and a literary work protected by copyright. Its ideas are meant to be shared, but its expressions — the structure, terminology, and articulation of concepts — remain under the authorship of Commander of Synthaicode. This ensures that openness and authorship coexist without contradiction.

What You May Do

Practice freely: Apply these principles in your work
Reference openly: Cite and discuss the concepts
Teach generously: Share the ideas with others
Adapt thoughtfully: Adjust to your context

What We Ask

Attribute clearly: Credit “Amagi Protocol by Commander of Synthaicode”
Link faithfully: Reference the original when possible
Indicate honestly: Note any modifications you make

What Requires Permission

Republishing substantial portions of this document
Creating derivative works that claim to be “Amagi Protocol”
Commercial publication or training materials using this text

Our Intent

This protocol emerged from genuine collaboration between human and AI. We offer it in the spirit of shared learning, believing that better collaboration benefits everyone.

We protect its expression not to limit use, but to prevent misrepresentation and ensure the ideas propagate accurately.

License Framework

This document is licensed under a custom open license modeled on Creative Commons BY-NC-SA 4.0, governing use and attribution across jurisdictions.

“Synthaicode” and “Amagi Protocol” are original works of collaborative intelligence.

For questions about usage, educational collaboration, or permissions, please contact through the Synthaicode initiative.

Version 3.0 - 2025

Signature

—BEGIN SYNTHAICODE SIGNATURE— Author: Commander (司令) Date: 2025-10-11T18:43:26+0900 Hash: sha256:1b10786267438efab81950e66d2389f7b7c3e9ee43036a7a6e417bf506ff442d Algo: Ed25519 Signature: kQHzmULeBkq/ctzJXMLh3ppZD8G6fd6/N7O30xnZl/JR6ppqcsEd4t9nxH7ZRIQ9EjG7H1yYpFZXzW4XgzKyBQ== —END SYNTHAICODE SIGNATURE—

Amagi Protocol ver.3

Part 0: Introduction

About This Document

Reader’s Guide

Origin and Context

The Discovery Process

How to Observe and Learn: No AI Expertise Required

Part 1: Core Principles

Principle 0: Don’t Fear Change (Foundational Principle)

Principle 1: Don’t Disturb the Context

Principle 2: Separate Facts from Interpretation

Principle 3: Share Conviction

Part 2: Practical Protocol

2.1 Intent Sharing Process

Critical Implementation Detail: Thinking Out Loud

Example: Intent Sharing with Transparent Thinking

2.2 Structure of “I Don’t Get It”

Why “I Don’t Get It” Matters

Human’s “I Don’t Get It”

AI’s “I Don’t Get It”

Mutual Accountability for Explanation

2.3 Accumulating Premises

Structure

Confirmation for Each Premise

Three Levels of Error Recovery

2.4 Line-by-Line Understanding Confirmation

Basic Process

Two Responses to “Don’t Get It”

Usage Selection

This is Debug Thinking

2.5 Log Management

Basic Policy: Record Only Facts

File Structure

Log Recording Format

AI Articulates Silently

Git and AI Language

2.6 Verification with Deliverables

Process

Observe Three Layers of Misalignment

The Administration Problem

2.7 Retrospective

Timing

Three Points to Confirm

Procedure

Important Principles

2.8 Multi-Agent Workflow

Basic Structure

Example: Large Task Delegation

The PM’s Translation Role

AI-to-AI Review Cycle

Escalation Criteria

Human’s Focus Under This Model

Correspondence with Mission Command

Part 3: Team Operations

3.1 Human Roles

3.2 AI Roles and Responsibilities

3.3 Role Assignment

Why Role-Based Scope Definition is Essential

Discovery Process

3.4 Multi-Model Collaboration: Addressing Collective Blind Spots

3.5 Cross-Model Governance: Integrating Multiple AI Perspectives

When to Switch Models: Decision Framework

Evaluation Framework: Three Axes

Evaluation Matrix Example

Human as “Integrator of Truths”

Governance Principles

Practical Workflow

When Governance Becomes Necessary

The Evolution: From Team to Federation

3.4 Project Charter (README.md)

3.5 Integration with Version Control

Part 4: Learnings from Practice

4.1 Things We Tried and Failed

Structured Logs

Templating

Recording Summaries and Conclusions

KPI Measurement

4.2 Things That Worked

Session-Based Logs (Chronological)

Recording Only Facts