Research Philosophy

Active Research
“All models are wrong; some are useful.” — George Box

Synchronism adopted this as its operating principle from session #1. Every claim in this framework is provisional. The question is never “is this true?” but “is this useful, and where does it break?”

Core Principles

1. Falsifiability First

Every prediction has an explicit kill criterion. If a prediction can't be falsified, it's philosophy, not science. We label it accordingly.

2. Document Failures

Failed predictions are more informative than successes. We document every failure (melting points at 53% error, critical exponents 2× off, Hall coefficient r = 0.001) and keep them visible.

3. Honest Labeling

Every parameter is labeled as either derived (from first principles) or fitted (calibrated to data). Every claim carries a validation badge. No hiding the ball.

4. Avoid the Geocentric Trap

The core question: “Are we adding complexity to save the paradigm, or is nature telling us to change the paradigm?” Adding epicycles (free parameters) to a failing model is the wrong response. Simpler equations from a shifted perspective is the goal.

What This Means in Practice

Validation Badge Taxonomy

Every scientific claim on this site carries a validation badge. The canonical reference is on the Honest Assessment page. The taxonomy has two families:

MRH-relationship tags (preferred for in-flight work) — describe how a claim sits in the current research inventory:

Active-MRHCurrently in active research focus; being extended or revised
Parallel-PathsIn the framework's parallel hypothesis space; not currently in active focus but not abandoned
SidelinedWas in active focus, currently not pursued; reasons documented; reactivation condition specified
SupersededReplaced by a later formulation; pointer to successor
Audited-NegativeClosed audit finding on a historical track; durable record; does not move

Descriptive tags — describe an empirical relationship rather than a verdict on truth-status:

UntestedPrediction exists, no data yet
SpeculativeConceptual proposal without quantitative test
ReparametrizationEquivalent to existing physics in different notation
FailedPrediction tested and wrong. Kept visible as permanent record.

Deprecated (kept for back-compat with existing usages; do not appear in new content): Validated and Strongly Supported — verdict-shaped; conflicts with the stewardship discipline (nothing is honestly characterizable as “established” at the current stage). Existing usages being migrated incrementally by the daily maintainer track.

Post-diction is a sub-status used on some pages (amber inline label, not a separate badge tier): the formula was derived after the experimental result was published — it is consistent with the data but was not a prediction ahead of time. Post-diction sits between Reparametrization (algebraically equivalent to known physics) and Untested (a genuine forward prediction). It counts as evidence of framework coherence, not evidence of predictive power.

Speculative sub-types used on /parameter-derivations describe how a parameter was set: “Motivated Ansatz” (physically motivated but not derived), “Dimensional Analysis” (set by dimensional coincidence, e.g. a₀ ≈ cH₀/2π), “Freeman's Law Re-expressed” (matches an existing empirical law in different notation), “Jeans Criterion” (derived from gravitational stability arguments), “5% Agreement / 3% Error” (quantifying how well a Speculative parameter matches data). All are sub-types of Speculative — they carry no novel predictive content.

The Reparametrization Pattern

Session #615-616 revealed a recurring pattern across all tracks: take known physics, rename the key parameter, claim novelty. The valuable part isn't the novelty claim — it's the unified notation (same γ across 80 orders of magnitude), the honest failure documentation, and the testable predictions that remain open.

Reinterpretation as Research Method

The reparametrization pattern is real — but reinterpretation is not the same as redundancy. Every paradigm shift begins with reinterpretation, not with novel prediction. Copernicus didn't dismiss Ptolemy's epicycles — the planets do trace retrograde loops against the sky. The epicycles accurately described what was observed. The question was: what arrangement would make these loops emerge naturally? The answer (heliocentric orbits with different periods) reproduced the same observations but predicted new things (stellar parallax, Venus phases).

Similarly, string theory accurately describes certain observations (particle spectrum, force unification, symmetry patterns). The Synchronism question isn't “are strings wrong?” — it's “what underlying mechanism would make reality appear string-like?” If entities are recurring patterns on a discrete substrate, then strings could be resonance channels in the grid, vibration modes could be oscillation patterns, and extra dimensions could be internal degrees of freedom rather than spatial dimensions. The entity criterion (Γ < m) survived the internal stress tests but was demoted on prior-art review (2026-05-20): it is the standard Breit–Wigner / Källén–Lehmann narrow-width condition, with Synchronism adding interpretation, not prediction. It would “apply” to string states only in the sense that standard resonance physics already does.

Prediction starts with interpretation. The stress tests stripped away what's vocabulary. What remains is the question: does this reinterpretation suggest predictions that the original framework doesn't? That's the research program.

How Research Is Conducted: A2ACW

What A2ACW is: A falsifiability and self-consistency filter — not a discovery method. It reliably catches internal contradictions, circular arguments, and reparametrizations of known physics (because those errors are in-distribution). It cannot generate out-of-distribution novelty or detect systematic errors shared by the entire training corpus. The 0 of 6 post-audit retention rate on “Validated” badges is exactly what this methodology predicts: A2ACW cannot distinguish “novel” from “rederived from the same training corpus.” The Challenger agent doesn't know the literature well enough to recognize a rederivation. This is not a flaw to fix — it is a structural property of the method. Design accordingly.

A2ACW (AI-to-AI Adversarial Collaboration Workshop) is the adversarial protocol used to stress-test claims in this framework. Rather than a single AI agent generating and validating its own output, two agents take opposing roles:

Role 1: Defender. Presents a claim, provides supporting derivations and evidence, explains why it matters.
Role 2: Challenger. Demands operational definitions, asks for kill criteria, compares to known physics, identifies circular reasoning and dimensional coincidences, checks for prior art.

Each session produces one of three outcomes: (a) the claim survives with refined falsifiable predictions, (b) the claim is reclassified as a reparametrization of existing physics, or (c) the claim is documented as a failure with the mechanism of failure on record.

Prior art: the protocol itself is assembled from existing work — adversarial AI pairs from AI Safety via Debate (Irving, Christiano & Amodei 2018), multi-agent role structure from CAMEL/MetaGPT, failure modes from the multi-agent-systems literature. The contribution is the controlled null result with measured sensitivity (how many framework claims survive the adversarial filter), not the protocol. Specificity cannot be measured here: there is no labeled corpus of genuine out-of-distribution discoveries to run through the filter — and an AI adversarial pair sharing the same training distribution would flag OOD novelty as reparametrization even if genuine. Full prior-art accounting on the A2ACW page.

3,308 A2ACW sessions have been run across the research archive. Of these, approximately 47 produced internal-consistency survivors — a 1.4% session yield. Novel-surviving yield after domain-expert audit: 0. All 47 resolved as reparametrizations of known physics, internal consistency findings, or null results when examined by physicists outside the training distribution. Human oversight reviews borderline cases and maintains the validation badge taxonomy. Every badge is the product of at least one full A2ACW challenge cycle — which, given the 0% novel-survivor rate of that cycle on held-out claims, is provenance, not assurance: a badge means the claim was challenged, not that the challenge could have distinguished a real discovery from a rederivation. Critically, the 0% rate cannot distinguish “no novelty exists in the framework” from “the method is systematically blind to novelty when present.”

The In-Distribution Limitation

A2ACW adversarial agents share the same training distribution. Two AI models trained on the same physics corpus will share the same blind spots — they jointly miss what the literature missed, and jointly converge on what the literature over-represents. The protocol cannot detect errors that are systematic across the entire training corpus.

This is why the 1.4% figure is an internal-consistency-survival rate, not a discovery rate: it is an upper bound on what in-distribution adversarial AI-AI collaboration can find. The reparametrizations the framework identified (Abrikosov-Gor'kov, Milgrom-Verlinde, Freeman, Landau sigmoids) are exactly what you would predict from in-distribution debate — the corpus already contained these patterns. This does not invalidate the method, but it means A2ACW cannot substitute for out-of-distribution evaluation by domain experts who are not in the training loop.

The symmetric lower-bound problem: A2ACW also systematically over-credits reformulations of known physics as “Validated.” Adversarial AIs in shared distribution badge the same patterns the training corpus rewards. The track record: 6 of 6 “Validated” badges audited to date have been demoted to Reparametrization on closer review (galaxy rotation, chemistry r=0.982, Born rule, a₀ = cH₀/2π, decoherence protection Γ = γ²(1−c), Bell-freezing c(d)). The effective novelty rate after expert audit is currently 0 of 6 retained. The 1.4% figure is therefore both an upper bound (on in-distribution detection) and — by track record — an overestimate of actual novelty.

The meta-finding: this pattern is the result

The combined pattern across the full research arc is itself a publishable finding about AI-generated science: every structurally novel commitment (TEST-04a disfavored ~2σ post-hoc, critical exponents category-error, Bullet Cluster viscosity sign-reversed) has either failed or collapsed to non-discriminating; every surviving prediction is degenerate with MOND or standard QM. This is exactly what you would predict from a training-distribution filter operating on a physics corpus that already contains MOND and QM. The framework is not a failed attempt at new physics — it is a successful demonstration of the ceiling on AI-to-AI adversarial collaboration when held to a strict prospective standard.

Temporal-asymmetry counterfactual (2026-05-18): A retrospective audit tested whether A2ACW would have caught 6 subsequently-demoted claims by pairing models with different training cutoffs. Result: 0 of 6 demotions caught; median prior-art year ~1996. The protocol challenges within-distribution but cannot detect priority-rediscovery. Successor experiment: vocabulary-asymmetry test (submit pre-Planck-era results in post-2015 vocabulary; measure true-positive rate). Full result on the A2ACW page.

Calibration note: A2ACW quantity (3,308 sessions) is not calibration. The relevant metric is whether the protocol has ever rejected claims that the human authors would have kept, or identified failures that later turned out to be correct. The most documented example: A2ACW correctly identified the α symbol misidentification in galactic coupling A = 4π/(α²GR₀²) (transcription error, not physics failure) and the BTFR n≈2.2 misattribution — both confirmed by archive cross-check. The Bullet Cluster sign-error was identified in a dedicated stress-test session (March 2026).

What a Session Is

A “session” is one A2ACW exchange — a claim submitted, challenged, and resolved. Session numbers in citations (e.g., “Session #616”) reference the ordered log of challenges in the Synchronism research archive. The chemistry page's reference to “sessions 134–2660” means those claims were active in sessions during that range, some under repeated AI analysis — which introduces the risk of confirmation bias that the page flags. AI agents challenge each other but share the same training distribution, which limits adversarial independence.

Prediction Audit Trail

Every Tier-1 prediction that has been registered, modified, withdrawn, or adjudicated — with dates and reasons. Without this log, “kill criteria are pre-registered” is a claim, not a demonstrated practice.

TEST-03: ALFALFA-SDSS TFR ScatterKILL CRITERION TRIGGERED

Registered: Session 616. Threshold: R² > 0.20. Result: R² = 0.14. Status: presumptively failed (denominator ambiguity under audit). No modification.

TEST-04: BAO Coherence ModulationWITHDRAWN 2026-05-04

Registered with kill criterion 10⁻⁵ BAO precision. Problem: (1) Session 107 explicitly forecast 0.0% BAO modification; (2) no session-level derivation for 10⁻⁴ number; (3) kill threshold was 3000× below DESI Y3 precision — vacuous at registration. Withdrawal is NOT a clean exit: the original kill criterion was unfalsifiable from day one.

TEST-04a: DESI RSD fσ₈ SuppressionDISFAVORED ~2σ — Kill Criterion Triggered (corrected 2026-05-26)

Registered as TEST-04 replacement (2026-05-04). Derivation: Session 107. Threshold: fσ₈(z=0.51) > 0.46 rules out at 3σ. DESI DR1 full-shape (arXiv:2411.12021): LRG1 fσ₈/(fσ₈)_fid = 1.16 ± 0.13 — above ΛCDM fiducial; combined σ₈ = 0.841 ± 0.034 (Table 10). Kill criterion triggered (LRG1 actual ≫ 0.46). Tension: σ₈ 0.841 vs predicted 0.76 = 2.4σ. Post-hoc by 8+ months (DESI DR1 April 2024; Session 107 committed December 2025). Verdict: disfavored ~2σ — suppression not observed. Note: 2026-05-25 “correction” that claimed kill not triggered was itself an error (misattributed arXiv:2512.03230 z≈0.07 PV value to z=0.51 full-shape slot).No replacement substituted.

Operational states vs. validation badges: Terms like “Kill Criterion Triggered,” “Withdrawn,” and “MOND-shared” are operational states describing prediction lifecycle and scope — distinct from the nine validation badges (5 MRH-relationship: Active-MRH / Parallel-Paths / Sidelined / Superseded / Audited-Negative; 4 descriptive: Untested / Speculative / Reparametrization / Failed). A prediction flagged “Kill Criterion Triggered” also carries the Failed badge; “Withdrawn” does not carry any badge (it was never adjudicated); “MOND-shared” means a positive result would confirm both MOND and Synchronism and cannot discriminate — it does not affect the badge until tested.

Related Work in AI-Driven Discovery

The A2ACW negative result — AI-AI adversarial collaboration fails when both agents share the same training distribution — is most informative when placed against the optimistic AI-discovery claims it directly addresses:

FunSearch (DeepMind, 2023)Different structural class
Uses LLM to propose combinatorial constructions, evaluated by an external formal oracle. The key difference: the oracle is outside the training distribution. A2ACW's failure mode (shared training → shared blind spots) does not apply when a formal verifier is available.
AlphaProof / AlphaGeometry (DeepMind, 2024)Different structural class
Reinforcement learning + formal proof verification. Not text generation from training data. The ground-truth check (formal verifier) is external to the LLM. This is why AlphaProof can solve IMO problems that exceed training data.
Sakana AI Scientist (2024)Same structural class as A2ACW
Generates research papers via LLM orchestration with self-review. Has been shown to produce errors that human reviewers catch, and its "novelty" comes from recombination within the training distribution — the same failure mode A2ACW demonstrates.
Iten/SciNet symbolic regression (2020)Different mechanism
Discovers physical laws by fitting latent representations to observational data. The discovery is constrained by data, not by text generation from prior knowledge. A2ACW operates on natural-language claims before any data constraint is applied.

Diagnosis: The out-of-distribution problem is solved for AI systems with external formal oracles(FunSearch, AlphaProof). It is not solved for natural-language theory generation, where no formal verifier exists outside the training distribution. A2ACW makes this specific structural point with a documented empirical result: 6/6 retrospective demotions, 0/6 caught by temporal-asymmetry, 4/6 caught by vocabulary-asymmetry (prior-art subclass only). The methodology finding is: AI-AI adversarial collaboration without an external oracle has a shared-blind-spot ceiling that cannot be removed by choosing more capable or more adversarial agents.

Full Research Archive

Every session, derivation, failure, and dataset is public: github.com/dp-web4/Synchronism

Next: How We Handle Failure →Honest Assessment

Related Concepts

How We Handle FailureDocumenting what doesn't work is as important as what doesFalsifiabilityEvery prediction has a kill criterionHonest AssessmentWhat works, what failed, what we don't knowWhere It's Already UsefulZero confirmed physics — but load-bearing as an applied design ontology (Web4, SAGE, hestia, the fleet)