Automated IEP Generation from Traditional Chinese Parent-Teacher Interviews via Corpus-Grounded Feature Diffusion

2026-06-08 • Computation and Language

Computation and Language

AI summaryⓘ

The authors developed a method to help write Customized Education Plans (IEPs) in Traditional Chinese, where data and privacy limits make automated writing hard. They used a special process called Corpus-Grounded Feature Diffusion to create training data from a small expert set, then fine-tuned a language model locally. They found that skipping a usual grammar-checking step made the model faster and more accurate under Traditional Chinese constraints. Their system beats some popular AI models without needing internet access, offering a practical tool for special education in Traditional Chinese.

Individualized Education ProgramsTraditional Chinese NLPCorpus-Grounded Feature DiffusionLow-resource fine-tuningQLoRAGrammar-Constrained DecodingSMART Goal LadderBERTScoreLocal inferencePrivacy-preserving AI

Authors

Kuanlin Chen, Cheng-En Ou

Abstract

Writing Individualized Education Programs (IEPs) is a high-labor, knowledge-intensive document burden; English-language research has demonstrated that generative AI can significantly reduce drafting time, yet automated IEP generation in Traditional Chinese remains virtually unexplored due to domain data scarcity, strict privacy regulations, and the absence of local evaluation benchmarks. We propose a low-resource fine-tuning pipeline centered on Corpus-Grounded Feature Diffusion (CGFD): (1) 25 dual-expert high-score seed transcripts are selected via a tau threshold with flag-aware score caps; (2) a FeatureProfile (sentence length, structure, quantification templates) is extracted from seeds and injected into LLM prompts alongside Verbalized-Sampling-style diversity control to drive diffusion; (3) 15 expert gold seeds are used as diffusion anchors, targeting 585 samples; 567 valid diffusion samples are obtained, yielding a 582-sample training set used to fine-tune Breeze-7B with QLoRA; (4) schema-constrained inference via Grammar-Constrained Decoding (GCD) enforces a hierarchical SMART Goal Ladder schema at inference time. Ablation results on a 55-sample schema stress set reveal an unexpected finding: GCD is counterproductive under Traditional Chinese token budgets -- the no-GCD path achieves 100% schema pass rate at 34% lower median latency, outperforming GCD on both reliability and speed. On the n=10 formal hold-out, the no-GCD inference path achieves BERTScore F1 = 0.779, exceeding GPT-5.4 (0.726), DeepSeek-V3.2 (0.703), Gemini-3-Flash-Preview (0.703), and Llama-4-Maverick (0.700) zero-shot baselines while maintaining fully local, air-gapped inference. This system addresses a gap in Traditional Chinese special-education NLP and offers a scalable, privacy-preserving local inference solution under an industrial engineering paradigm.

View PDFOpen arXiv