PVminerLLM2: Improving Structured Extraction of Patient Voice via Preference Optimization

2026-06-15Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors improved a language model called PVminerLLM2 to better extract structured information from patient-generated text, which is usually hard to analyze because it is unorganized. They used a new training approach that focuses on preferences and token-level details to fix mistakes that normal fine-tuning methods miss, especially for rare or tricky parts of the data. Their method also adjusts for imbalances in the types of tokens and errors. Overall, their improved model showed consistent and measurable improvements over previous versions and other methods.

patient-generated textstructured extractionlanguage modelssupervised fine-tuningpreference optimizationtoken-level errorsclass imbalancetoken weightingPV-Miner benchmarknatural language processing
Authors
Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Elyas Irankhah, Sreeraj Ramachandran, Ashley Hagaman, Sarah Lowe, Aimee Roundtree
Abstract
Motivation: Patient-generated text contains critical information on patients' lived experiences, social context, and care engagement, but remains largely unstructured, limiting its use in patient-centered outcomes research. Prior work introduced the PV-Miner benchmark and PVMinerLLM models for structured extraction. However, supervised fine-tuning (SFT) alone struggles with rare, fine-grained, and unevenly distributed errors, particularly in token-critical structured outputs. Results: We present PVminerLLM2, an improved set of LLMs for structured patient voice extraction that applies preference optimization to address token-critical errors beyond the reach of supervised fine-tuning. Our method introduces (i) a preference objective with token-level gated stabilization term that prevents degradation of absolute token likelihood under preference optimization, and (ii) confusion-aware preference pair construction to better capture low-separation distinctions. We further incorporate token-importance weighting and inverse-frequency reweighing to address token imbalance and class skew. Across multiple model sizes, PVMinerLLM2 consistently outperforms strong baselines, achieving gains of up to 4.43% (Code), 3.50% (Sub-code), and 1.55% (Span), and outperforms baseline LLM trained with existing preference optimization methods. Availability and Implementation: The supplementary material, code, evaluation scripts, and trained models for PVminerLLM2 are publicly available at: https://github.com/Data-Mining-Lab-Yale/PVminerLLM2