Heterogeneous Tactile Transformer

2026-06-29Robotics

Robotics
AI summary

The authors address the problem that tactile sensors used in robots are very different from each other, making it hard to use data from one sensor to help another. They propose the Heterogeneous Tactile Transformer (HTT), a model that learns common patterns from different types of tactile sensors by using special parts for each sensor and a shared processing core. They train HTT on a large new dataset containing 1.6 million paired tactile readings from four different sensors. Their experiments show that HTT can transfer what it learned to new tasks and sensors it hasn’t seen before.

tactile sensorsheterogeneous datatransformer modelrepresentation learningcross-modal alignmentmasked reconstructionrobot manipulationsensor encoderspretrainingdataset
Authors
Jianxin Bi, Qiang Wang, Jayaram Reddy, Kelvin Lin, Soibkhon Khajikhanov, Ruihan Gao, Harold Soh
Abstract
Tactile sensors are inherently heterogeneous: a model trained on one sensor cannot be directly used on another, which limits learning contact-rich manipulation policies from diverse tactile data at scale. To bridge this gap, we propose the Heterogeneous Tactile Transformer (HTT), a framework that learns shared tactile representations across heterogeneous sensors. HTT consists of sensor-specific encoders and a shared transformer trunk, and is pretrained with per-modality masked reconstruction together with cross-modal alignment between paired sensors. Pretraining uses our novel Heterogeneous Paired Tactile (HPT) dataset, containing 1.6M synchronized paired frames across four vision- and array-based tactile sensors. Across distinct tactile perception and real-world manipulation tasks, HTT is shown to learn transferable representations that adapt to new tasks and previously unseen sensors. Dataset, code, and model checkpoints will be released upon publication at https://jxbi1010.github.io/htt-gh-page/.