Building Korean linguistic resource for NLU data generation of banking app CS dialog system

2026-05-11Computation and Language

Computation and LanguageMachine Learning
AI summary

The authors created a special Korean language dataset called FIAD to help computers understand banking customer service requests better. They studied common sentence patterns in banking app reviews and used these patterns to make annotated training data. They tested various models trained on this data and found that adding Korean language models improved the understanding of intents and topics. This resource helps improve task-oriented dialogue systems in Korean banking contexts.

Natural Language UnderstandingTask-oriented Dialog SystemsAnnotated DatasetBanking Customer ServiceKorean LanguageIntent RecognitionEntity ExtractionLocal Grammar GraphsDIET ModelBERT Variants
Authors
Jeongwoo Yoon, On-yu Park, Changhoe Hwang, Gwanghoon Yoo, Eric Laporte, Jeesun Nam
Abstract
Natural language understanding (NLU) is integral to task-oriented dialog systems, but demands a considerable amount of annotated training data to increase the coverage of diverse utterances. In this study, we report the construction of a linguistic resource named FIAD (Financial Annotated Dataset) and its use to generate a Korean annotated training data for NLU in the banking customer service (CS) domain. By an empirical examination of a corpus of banking app reviews, we identified three linguistic patterns occurring in Korean request utterances: TOPIC (ENTITY, FEATURE), EVENT, and DISCOURSE MARKER. We represented them in LGGs (Local Grammar Graphs) to generate annotated data covering diverse intents and entities. To assess the practicality of the resource, we evaluate the performances of DIET-only (Intent: 0.91 /Topic [entity+feature]: 0.83), DIET+ HANBERT (I:0.94/T:0.85), DIET+ KoBERT (I:0.94/T:0.86), and DIET+ KorBERT (I:0.95/T:0.84) models trained on FIAD-generated data to extract various types of semantic items.