DEBISS: a Corpus of Individual, Semi-structured and Spoken Debates

2026-03-05 • Computation and Language

Computation and LanguageDatabases

AI summaryⓘ

The authors explain that debating is a common part of daily life in many settings, but collecting different types of debate examples can be hard. To help with this, they created the DEBISS corpus, which is a set of recorded spoken debates organized in a semi-structured way. This dataset comes with various types of information useful for natural language processing tasks like converting speech to text and analyzing arguments. Their work aims to provide a helpful resource for studying debates in different formats.

debatecorpusnatural language processingspeech-to-textspeaker diarizationargument miningdebater quality assessmentsemi-structured data

Authors

Klaywert Danillo Ferreira de Souza, David Eduardo Pereira, Cláudio E. C. Campelo, Larissa Lucena Vasconcelos

Abstract

The process of debating is essential in our daily lives, whether in studying, work activities, simple everyday discussions, political debates on TV, or online discussions on social networks. The range of uses for debates is broad. Due to the diverse applications, structures, and formats of debates, developing corpora that account for these variations can be challenging, and the scarcity of debate corpora in the state of the art is notable. For this reason, the current research proposes the DEBISS corpus: a collection of spoken and individual debates with semi-structured features. With a broad range of NLP task annotations, such as speech-to-text, speaker diarization, argument mining, and debater quality assessment.

View PDFOpen arXiv