Latent space mapping of interpretable structural coordinates from stochastic single-molecule signals

2026-06-15Machine Learning

Machine Learning
AI summary

The authors found a new way to analyze signals from nanopores, tiny sensors that detect molecules like DNA but usually give noisy, hard-to-interpret data. Instead of looking at the timing of signals, they trained a computer program to translate these signals into a simpler, meaningful code that represents the DNA’s structure. This approach works well even when conditions change, and it’s much faster than older methods. They tested it with real experiments and showed it can identify DNA mixtures and rare types accurately. This method makes it easier and quicker to understand what nanopores detect by focusing on the molecule’s features rather than the noisy signal timing.

NanoporesDNA barcodesStochastic translocationContrastive encoderLatent spacePhysics-informed modelSignal processingMolecular identificationSingle-molecule sensingReal-time acquisition
Authors
Matteo Cartiglia, Sandro Kuppel, Wouter Botermans Wannes Peeters, Natan Biesmans, Liam Vandekerckhove, Eric Beamish, Koen Ongena, Wouter Renckens, Pol Van Dorpe, Sanjin Marion
Abstract
Nanopores are versatile single-molecular sensors, but their utility is fundamentally constrained by stochastic translocation dynamics warping any encoded information. We resolve it by shifting from time-domain analysis to a learned latent-space mapping via a contrastive encoder trained exclusively on simulated signals from a physics-informed model. This encoder maps solid-state nanopore signals of engineered DNA barcodes into an interpretable molecular coordinate system. The learned representation is responsive to structural barcode parameters while remaining invariant to acquisition conditions and translocation conformation, allowing data pooling across devices. Molecule identification requires a single pass through the encoder, reducing computational cost by three orders of magnitude relative to alignment-based methods. We experimentally validate through mixture quantification, rare-variant detection, consensus barcode reconstruction, and real-time signal acquisition. This shift from temporal analysis to mapping structural coordinates into a latent space changes the paradigm behind analyzing stochastic sensor signals by linking classification to interpretable encoded molecular information.