Provenance over behavior | Neural Substrate

See the Neural Substrate project page for the broader technical context and weekly project log.

The relay problem

A system can pass every behavioral test while being a relay. "It navigates / it tracks / it works" does not establish that the system is doing the computation. An external signal can be silently doing the work, and the behavior will look identical from the outside.

This is the headline lesson and it generalizes outside neural systems: any AI, agentic, or simulated system whose validation budget consists of "does it produce the right output" is vulnerable to the same failure mode. The right output is necessary. It is not sufficient.

Provenance is the test, not behavior

The actual test is provenance. Trace every input the system uses back to its leaves. If each leaf is something the system genuinely sensed or did — a proprioceptive signal it generated, a sensory input it received, a recurrent state it computed from prior recurrent state — the computation is real. If any leaf is privileged ground-truth that the system could not have sensed in deployment, the input is a scaffold. The behavior is downstream of the scaffold, not of the system being studied.

This applies whether the scaffold is direct (truth injected as a state variable), indirect (a quantity derived from truth through one or two transformations), or environmental (a quantity the surrounding test rig supplies for free that the deployed system would not have).

Decorrelated tests

A metric is only meaningful where the genuine answer and the fake answer diverge. If the variable being tracked and the true variable coincide under the test conditions, agreement proves nothing — both a real system and a relay will pass.

In neural terms: don't validate a heading estimator under conditions where heading is being continuously provided. Validate it under conditions where heading would need to be inferred — dark windows, sensor loss, conflicting cues. Equivalent strategies exist in any domain: design tests where a scaffold and the system give different answers, then look at the result there.

Probes lie by omission

A component can pass in isolation because the rig supplies a condition that does not exist in deployment. A unit test that depends on a constant being non-zero is not a unit test of the component if that constant is zero in the real loop. The probe passes. The component does not.

Component-level tests have to be evaluated against the conditions of the integrated system, not the conditions of the bench rig.

The recurring pattern

Every scaffold identified during this work was world-truth wearing a biological label. The scaffolds returned in subtler forms across iterations: direct injection of true state, intermittent correction of drift toward true state, quantities derived from true state through a transformation, and environmental conventions that supplied true state for free ("the surrounding system will provide it"). Each form looked more plausible than the last. None of them were the system doing the computation.

Naming the pattern is itself useful. The shape of world-truth wearing a biological label recurs because the gradient of validation pressure pushes systems toward whatever scaffold is easiest to construct, not toward what is real. Without a name for the failure mode, it shows up wearing different clothes each time.

Docs drift; code is the only ground truth

Documentation asserts values, and those assertions become false authority once the code moves on. The cost of a wrong number in a doc is not the wrong number — it is the next reader acting on it.

Project policy: docs describe intent and structure; they do not assert quantitative values. Constants live in code. When a doc disagrees with code, the doc is wrong by definition.

What this validated

A head-direction ring attractor whose heading estimate is genuinely self-generated — from path integration and recurrent dynamics — rather than read from ground-truth coordinates, validated by adversarial decorrelated tests including dark-window sustain.

The methodology above is what made that statement falsifiable rather than rhetorical.