AI summaryⓘ
The authors point out that just checking if a neural model's predictions are close to the real outcomes isn't enough to understand if it truly learns the underlying rules that govern the system. They introduce a method that looks at how the model's output changes when the input changes, which reveals deeper details about the model's behavior, like how it handles different frequencies and interactions within the system. This deeper check, called a Jacobian-based spectral audit, can find problems that normal accuracy tests miss, such as unstable predictions or wrong sensitivity to inputs. Their work shows that being accurate and learning the true system dynamics are different things, and their method helps diagnose when models fail to capture these dynamics properly.
neural operatorsin-context learningJacobian matrixspectral analysispartial differential equations (PDE)frequency responsemode couplingtangent operatorprediction errorstability analysis
Abstract
Existing evaluations of neural operators and in-context operator learning rely primarily on prediction error, but accurate output prediction does not guarantee the correct local dynamical structure. A model may match solutions while exhibiting incorrect sensitivities, distorted frequency response, spurious mode coupling, or unstable tangent behavior. We introduce a Jacobian-based spectral audit for in-context operator learning. For a fixed prompt, we differentiate the network output with respect to the query function and view the resulting Jacobian as a learned tangent operator. Projecting it onto Fourier modes, we obtain a local spectral characterization of the inferred operator, including frequency-dependent gains, phase structure, and cross-mode coupling. The audit complements standard prediction metrics by testing whether the model reproduces local mechanisms of the underlying PDE operator rather than only outputs. Across benchmarks, the audit reveals distinct operator-level phenomena, including phase transport, viscosity-dependent damping, nonlinear mode coupling, and reaction--diffusion stability structure. It also detects failures partially hidden by prediction-error metrics, including high-frequency degradation, incorrect phase recovery, and prompt--operator inconsistencies. Corrupted or internally inconsistent prompts lead to degraded tangent-operator structure even when pointwise predictions remain partially accurate. Our results suggest that prediction accuracy and local operator fidelity are distinct properties of learned neural operators. Our framework also provides a diagnostic for stability, sensitivity, and operator consistency.