DECO-MWE: building a linguistic resource of Korean multiword expressions for feature-based sentiment analysis

2026-05-11Computation and Language

Computation and Language
AI summary

The authors created a special Korean language resource called DECO-MWE to help computers understand expressions made of multiple words in product reviews, especially for figuring out people's feelings (sentiments). They focused on cosmetics reviews where these multiword expressions appear a lot and identified four types of such expressions related to sentiment and product features. They used a method called Local Grammar Graph to represent these expressions in a precise computational way, achieving good accuracy in tests. Their work provides both a useful dictionary of these expressions and a method that can be applied to other fields.

Multiword Expressions (MWEs)Feature-Based Sentiment Analysis (FBSA)Local Grammar Graph (LGG)Finite-State TransducerPolarityNamed EntityCorpusSentiment LexiconDomain-Dependent Expressions
Authors
Jaeho Han, Changhoe Hwang, Seongyong Choi, Gwanghoon Yoo, Eric Laporte, Jeesun Nam
Abstract
This paper aims to construct a linguistic resource of Korean Multiword Expressions for Feature-Based Sentiment Analysis (FBSA): DECO-MWE. Dealing with multiword expressions (MWEs) has been a critical issue in FBSA since many constructs reveal lexical idiosyncrasy. To construct linguistic resources of sentiment MWEs efficiently, we utilize the Local Grammar Graph (LGG) methodology: DECO-MWE is formalized as a Finite-State Transducer that represents lexical-syntactic restrictions on MWEs. In this study, we built a corpus of cosmetics review texts, which show particularly frequent occurrences of MWEs. Based on an empirical examination of the corpus, four types of MWEs have been distinguished. The DECO-MWE thus covers the following four categories: Standard Polarity MWEs (SMWEs), Domain-Dependent Polarity MWEs (DMWEs), Compound Named Entity MWEs (EMWEs) and Compound Feature MWEs (FMWEs). The retrieval performance of the DECO-MWE shows 0.806 f-measure in the test corpus. This study brings a twofold outcome: first, a sizeable general-purpose polarity MWE lexicon, which may be broadly used in FBSA; second, a finite-state methodology adopted in this study to treat domain-dependent MWEs such as idiosyncratic polarity expressions, named entity expressions or feature expressions, and which may be reused in describing linguistic properties of other corpus domains.