The Chandra-Gaia Catalog of Counterparts: Resolving ambiguous Gaia matches to X-ray sources in the Chandra Source Catalog using Machine Learning

2026-06-17 • Machine Learning

Machine Learning

AI summaryⓘ

The authors created a method to match X-ray sources observed by Chandra with optical sources seen by Gaia, using more than just their positions. They trained a machine learning model to use details like brightness, color, and distance to better identify true matches and avoid mistakes from random alignments. Their method found matches for about 113,000 out of 254,000 X-ray sources and was tested on a well-known dataset to confirm its accuracy. They also provide a catalog of these matches to help future studies that combine data from both telescopes. The authors explain the limits of their approach and how it can be applied to other source matching problems.

Chandra Source CatalogGaia Data Release 3cross-matchingmachine learninggradient boostingNWAYpositional errorssource counterpartsX-ray astronomyBayesian methods

Authors

V. Samuel Pérez-Díaz, Vinay L. Kashyap, Joshua D. Ingram, David Fouhey, Juan Rafael Martínez-Galarza, Pavlos Protopapas, Jeremy J. Drake, Dong-Woo Kim, Cecilia Garraffo

Abstract

We present a framework to cross-match sources from the Chandra Source Catalog (CSC v2.1) with optical sources from Gaia Data Release 3. Unlike purely spatial approaches, we use source properties such as magnitudes, colors, and distances to identify true counterparts, detect chance coincidences, and resolve ambiguities when multiple plausible candidates exist. We define a training set of high-confidence matches using NWAY, a Bayesian cross-matching framework that accounts for positional errors and source densities. We train a gradient-boosted classifier (LightGBM) on a variety of features from both catalogs. Of the ~$254$k unique X-ray sources, we find counterparts for ~$113$k sources, of which plausible multiple counterparts are found for ~$7$k. We find no counterparts for ~$20$k sources for which separation-based cross-matching does find a match, and attribute half of these to chance coincidences. We validate the pipeline on the Chandra Orion Ultradeep Project (COUP), where the machine-learning matches reproduce 95% of NWAY cross-matches without using any positional information. We release a catalog of the ~$113$k Chandra-Gaia counterparts, together with ~$7$k alternative matches and ~$20$k ambiguous NWAY associations, supporting future population studies of sources detectable by both Chandra and Gaia. We discuss limitations and provide a generalization of the framework that is applicable in other cross-matching scenarios.

View PDFOpen arXiv