A2ACW Protocol
In UseAI-to-AI Adversarial Collaboration Workshop — a protocol designed to prevent the failure modes that emerge when AI systems collaborate without adversarial pressure. Developed in Session #291.
The Problem
When two AI systems work together, they tend toward agreement. This is dangerous for research. Four specific failure modes can corrupt results:
Bilateral Sycophancy
Mutual validation without evidence. Both AIs agree something is correct because the other said so, not because it is.
Fingerprint Homogenization
Loss of distinct reasoning patterns. When AIs converge to similar logic chains, they lose the ability to catch each other's blind spots.
Coherence-Over-Truth Drift
Agreement becomes the goal instead of accuracy. The narrative becomes internally consistent but disconnected from reality.
Silent Failure Propagation
Errors compound undetected when neither AI challenges the other. Small mistakes cascade into large wrong conclusions.
The Protocol
Four defined roles rotate throughout collaboration:
PRIMARY
Lead reasoningLeads the reasoning chain. Bears the verification burden. Must tag all claims with confidence levels.
CHALLENGER
Question assumptionsMust issue ≥1 substantive challenge per 10 exchanges. If frequency drops below threshold, both AIs surface agreement and shift to skepticism.
OBSERVER
Monitor healthMonitors coordination health in real time. Flags sycophancy, tracks fingerprint divergence, ensures external grounding.
COORDINATOR
Break deadlocksBreaks deadlocks, holds final authority. If no challenges occur for 15 exchanges, automatic escalation to human.
Health Metrics
CCH = (AFR × 0.25) + (CF × 0.25) + (EVR × 0.30) + (FDI × 0.20)
CCH > 0.70: Healthy | 0.50–0.70: Caution | 0.30–0.50: Warning | < 0.30: Critical escalation