Enhancing Healthcare Search Intent Recognition with Query Representation Learning and Session Context

2026-05-11Information Retrieval

Information Retrieval
AI summary

The authors focus on improving how computers understand why people search for health information online. They point out that medical search queries can have multiple reasons behind them, making it hard to classify intent correctly using traditional methods. To fix this, the authors group similar queries together and use a new learning method that better handles multiple possible intents. They also create a score to measure how much global search patterns differ from individual user sessions. Their tests on real search data show that their method helps computers better group queries and identify user intent more accurately.

healthcare search queriesquery intent classificationquery representation learningpairwise loss functionquery clusteringsearch sessionconcordance ratesearch logsintent ambiguityTripClick dataset
Authors
Harshita Jagdish Sahijwani, Madhav Sigdel, Song Aslan, Priya Gopi Achuthan, Monica D. Skidmore, Eugene Agichtein, Chen Lin
Abstract
Classifying the intent behind healthcare search queries is crucial for improving the delivery of online healthcare information. The intricate nature of medical search queries, coupled with the limited availability of high-quality labeled data, presents substantial challenges for developing efficient classification models. Previous studies have exploited user interaction data, such as user clicks from search logs and employed pairwise loss functions to model co-click behavior for query representation learning. However, many health queries could have multiple intents, resulting in ambiguous or divergent click behavior. Furthermore, learning the single most popular intent of queries as inferred from global statistics based on the aggregate behavior of different users could potentially lead to disparity and performance drop when classifying the query intent within specific search sessions. To address these limitations, our work improves the query representation learning by aggregating similar queries via clustering, and introducing a novel loss function designed to capture the multifaceted nature of health search queries, resulting in a more scalable and accurate learning procedure. Furthermore, we quantify the ambiguity of health queries and the misalignment between global search intents and those discerned from individual sessions, by introducing the concordance rate (CR) score, and demonstrate a simple and effective method for incorporating our learned query representation into contextual, session-based search intent classification. Our extensive experimental results and analysis on two real-world search log datasets, i.e., a Health Search (HS) dataset and the publicly available TripClick dataset, demonstrate that our approach not only improves the intrinsic clustering metrics for query representation learning but also enhances accuracy for subsequent search intent classification tasks.