UNATE: UNsupervised ATomic Embedding for crystal structures property prediction

2026-05-25 • Machine Learning

Machine Learning

AI summaryⓘ

The authors created a new method called UNATE to help computers better understand crystal materials without needing a lot of labeled examples. Their method teaches the computer to recognize important atomic features by looking at many unlabeled crystal structures. Using these learned features improves the accuracy of predicting crystal properties, especially when only a small amount of labeled data is available. This approach can make discovering new materials faster and cheaper.

crystal propertiesunsupervised learningdenoising autoencoderself-supervised learningcontrastive learningatomic embeddingsmaterials discoverylabeled datanode embeddings

Authors

Laura Solà-Garcia, Àlex Solé, Javier Ruiz-Hidalgo

Abstract

Accurately predicting crystal properties is critical for accelerating materials discovery, but it is often limited by scarce labeled data and costly theoretical calculations. To alleviate this, we propose UNATE (Unsupervised Atomic Embedding), a framework that leverages structural information extracted from unlabeled crystal structures. UNATE integrates an unsupervised denoising autoencoder with self-supervised contrastive learning to learn robust atomic representations, which are then used as input features for downstream property prediction. Experimental results show that replacing raw atomic numbers with UNATE-pretrained node embeddings yields a 2.7\% improvement over the full-data baseline. Notably, the benefits become more pronounced in scenarios with limited labeled data, reaching improvements of up to 10\% when only 25\% of the labeled data is used.

View PDFOpen arXiv