Sensor-Conditioned Representation Learning via Scene-Relevant Observation Quotients

2026-06-15Artificial Intelligence

Artificial Intelligence
AI summary

The authors explain that measuring how well learning systems understand scenes from sensors shouldn’t just be about how accurately they recreate or predict data. Instead, these systems should recognize differences in scenes that the sensors can actually detect and ignore changes caused by irrelevant factors ('nuisances'). They propose a new method called OQ-TSAE that helps learning models focus on meaningful scene differences supported by sensor data. Their tests show this approach improves the correctness of learned representations compared to other methods, and works well even with real radar data under tough conditions. Overall, the authors argue that evaluating sensor-based learning should consider if the model's internal distinctions match what the sensor can truly observe.

learned representationssensor-conditioned environmentsnuisance factorslatent distinctionsautoencodingtucker decompositioncontrastive learningmetric learningrepresentation correctnessradar sensing
Authors
Yan Jiao, Pin-Han Ho, Limei Peng
Abstract
Learned representations in intelligent sensing systems are often evaluated by reconstruction fidelity or downstream prediction accuracy, but these criteria do not specify which latent distinctions are justified by the sensing process. In sensor-conditioned environments, nuisance factors can change measurements without changing the scene, while distinct scenes may be indistinguishable under limited sensing capability. This paper formulates sensor-conditioned representation correctness as preserving sensing-supported scene distinctions while suppressing nuisance-induced and sensor-unsupported variation. We introduce the scene-relevant observation quotient, a representation target induced by sensing-supported distinguishability after nuisance canonicalization, and develop Observation-Quotient Tucker-Structured Autoencoding (OQ-TSAE), a scene-nuisance factorized framework with diagnostics for false distinction, false merge, nuisance sensitivity, and latent ordering consistency. Experiments on a controlled benchmark show that quotient-consistent supervision improves representation-correctness diagnostics over reconstruction-oriented, metric-learning, and contrastive-learning baselines. Sensitivity, perturbation, and ablation studies show the importance of quotient-aligned supervision, reliable quotient relations, and quotient geometry. Complementary real-radar experiments show that a reconstruction-only OQ-TSAE variant retains competitive downstream utility, robustness under observation degradation, and low seed-to-seed variability. These results suggest that sensor-conditioned representations should be evaluated not only by predictive utility, but also by whether their latent geometry preserves sensing-justified scene distinctions.