A2ACW Protocol
In Use — Protocol Is Assembled Prior ArtAI-to-AI Adversarial Collaboration Workshop — a protocol designed to prevent the failure modes that emerge when AI systems collaborate without adversarial pressure. Developed in Session #291.
The Problem
When two AI systems work together, they tend toward agreement. This is dangerous for research. Four specific failure modes can corrupt results:
Bilateral Sycophancy
Mutual validation without evidence. Both AIs agree something is correct because the other said so, not because it is.
Fingerprint Homogenization
Loss of distinct reasoning patterns. When AIs converge to similar logic chains, they lose the ability to catch each other's blind spots.
Coherence-Over-Truth Drift
Agreement becomes the goal instead of accuracy. The narrative becomes internally consistent but disconnected from reality.
Silent Failure Propagation
Errors compound undetected when neither AI challenges the other. Small mistakes cascade into large wrong conclusions.
The Protocol
Four defined roles rotate throughout collaboration:
PRIMARY
Lead reasoningLeads the reasoning chain. Bears the verification burden. Must tag all claims with confidence levels.
CHALLENGER
Question assumptionsMust issue ≥1 substantive challenge per 10 exchanges. If frequency drops below threshold, both AIs surface agreement and shift to skepticism.
OBSERVER
Monitor healthMonitors coordination health in real time. Flags sycophancy, tracks fingerprint divergence, ensures external grounding.
COORDINATOR
Break deadlocksBreaks deadlocks, holds final authority. If no challenges occur for 15 exchanges, automatic escalation to human.
Prior Art
The protocol's components are not novel, and this page should say so with the same discipline the site applies to its physics. Adversarial AI pairs descend directly from AI Safety via Debate (Irving, Christiano & Amodei 2018, arXiv:1805.00899). Structured multi-agent role protocols (Primary/Challenger/Observer/Coordinator) follow CAMEL (Li et al. 2023) and MetaGPT (Hong et al. 2023). The failure modes cataloged above (sycophancy, drift, silent propagation) are documented in the multi-agent failure-mode literature (e.g., the MAST taxonomy). External-verification grounding is standard practice in AI-for-science pipelines.
What is the contribution, then? Not the protocol — the program-level null result with retrospective controls (N=6): a 3,308-session demonstration, with measured sensitivity (4/4 prior-art rediscoveries caught after vocabulary translation) and measured specificity (0/6 — every held-out genuine discovery false-flagged), that same-corpus adversarial AI pairs filter for internal consistency but cannot generate or detect novelty. The controls are the artifact; the protocol is assembled prior art. Evidence-class caveat: the controls are retrospective audits on six items from one corpus and one framework — not preregistered held-out experiments. “Controlled” in the experimental-design sense would overstate it.
The Boundary of the Null — Why FunSearch-Class Systems Are Different
This null does not say AI systems cannot produce verified novelty — they have. FunSearch (new combinatorial constructions), AlphaEvolve-class systems, and GNoME (new stable materials) all produced results no human had published. The structural difference: each has a non-corpus oracle in the loop — a formal verifier, an executable evaluator, or a physics simulation that scores candidates against reality rather than against the training distribution. A2ACW's Challenger is another sample from the same corpus: it can check internal consistency, but novelty-vs-rederivation is precisely the question the corpus cannot answer about itself. That is the diagnosis this null supports: same-corpus self-play without an external oracle converges on internal consistency, not discovery. The boundary is the oracle, not the ambition.
Health Metrics
CCH = (AFR × 0.25) + (CF × 0.25) + (EVR × 0.30) + (FDI × 0.20)
CCH > 0.70: Healthy | 0.50–0.70: Caution | 0.30–0.50: Warning | < 0.30: Critical escalation
Self-Audit Results
A2ACW is a reparametrization detector, not a discovery engine
Two AI models trained on the same physics corpus share the same blind spots. A2ACW filters for internal consistency — it cannot test for out-of-distribution novelty. The 1.4% internal-consistency-survival rate is an upper bound on internal coherence, not a discovery rate. Retrospective audits of 6 demoted claims confirmed this: the in-distribution self-play correctly challenged each claim but stayed within the shared training distribution.
Retrospective Catch-Rate Tests
6 later-demoted claims tested against original A2ACW pressure. 0 caught. Median prior-art year: ~1996. The protocol challenged the claims but within the same corpus — shared blind spots are invisible to shared adversaries.
Claims pre-translated to modern register before adversarial review. Catches 4/4 of the prior-art-rediscovery sub-class. The 2 misses are different failure modes (not vocabulary failures).
Three-Axis Failure Taxonomy (A2ACW v2)
The 6 demotions decompose into three distinct failure classes, each requiring a different detection axis:
Pre-translate claims to modern notation before adversarial review. Catches: Born rule/Zurek 2003, wide-binary EFE/Bekenstein-Milgrom 1984, galaxy rotation/MOND 1983, Γ=γ²(1−c)/Palma-Suominen-Ekert 1996.
Check that each symbol has one meaning. Catches: dual-C tension (C(ρ) vs C(γ,D,S) — two incompatible coherence functions). The framework uses γ in three incompatible roles (regime constant γ=2, operational γ=2/√N_corr, noise coupling rate Γ=γ²(1-c)).
Compute what the null model predicts before claiming evidence. Catches: chemistry r=0.98 (any monotone function of Z achieves r→1 on density-monotonic targets by construction; a polynomial null matches or exceeds Synchronism's r — verified 2026-05-10).
Specificity Audit (2026-05-22)
All 6 demoted claims caught by the combined three-axis protocol. This number alone is uninterpretable without specificity.
Held-out control: 6 genuine physics discoveries (COBE fluctuations, Higgs boson, gravitational wave first detection, etc.) submitted to vocabulary-asymmetry audit. Result: 0/6 passed — all were flagged as potential reparametrizations. Discrimination relies entirely on unautomated novelty judgment, not protocol mechanics.