Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring

2026-05-25Computation and Language

Computation and Language
AI summary

The authors focus on improving how computers score essays across different qualities, like grammar and content. They introduce a new training method called Trait-Aware Policy Optimization (TAPO) that teaches the model to consider each quality separately and together, making scoring more accurate. Their approach also uses better prompts before training to help the model understand each trait. Tests show that their method works better than older ways on various models. Overall, the authors provide a new way to fine-tune essay scoring models for more detailed and reliable results.

multi-trait essay scoringautoregressive modelspolicy optimizationfine-tuningreward decompositionprompt engineeringtrait-level accuracyinter-trait dependencysupervised learningscore consistency
Authors
Zhengyang Wang, Sanwoo Lee, Jiaxin Wang, Chenxi Miao, Weikang Li, Yunfang Wu
Abstract
Multi-trait essay scoring aims to provide fine-grained evaluation of writing quality across multiple dimensions. However, how to effectively post-train autoregressive scoring models remains underexplored. In this paper, we propose Trait-Aware Policy Optimization (TAPO), a post-training framework tailored to autoregressive multi-trait scoring. Our method decomposes rewards along both the sample and trait dimensions, combining global scoring consistency, trait-level accuracy, format validity, and inter-trait dependency preservation. In addition, we enhance supervised fine-tuning with enhanced prompts, allowing the model to internalize trait semantics before preference optimization. Experiments across multiple backbone models show that our method consistently improves multi-trait scoring performance over supervised fine-tuning and scalar-reward optimization baselines, demonstrating the effectiveness and transferability of trait-aware post-training for essay scoring.