Zero- and Few-Shot Named-Entity Recognition: Case Study and Dataset in the Crime Domain (CrimeNER)

2026-03-02Computation and Language

Computation and LanguageArtificial IntelligenceDatabases
AI summary

The authors focus on helping computers find important crime-related information in documents, like details about crimes, criminals, or police. They created a new collection of over 1,500 crime reports called CrimeNERdb, which is specially labeled to train these computer programs. They also defined different types of crime-related names to look for and tested how well current technology works when given little or no example data. This work aims to improve how well machines understand crime documents without needing lots of labeled examples.

Named-Entity RecognitionZero-Shot LearningFew-Shot LearningCrime Data AnnotationTerrorism ReportsUS Department of JusticeLarge Language ModelsNatural Language ProcessingInformation Extraction
Authors
Miguel Lopez-Duran, Julian Fierrez, Aythami Morales, Daniel DeAlcala, Gonzalo Mancera, Javier Irigoyen, Ruben Tolosana, Oscar Delgado, Francisco Jurado, Alvaro Ortigosa
Abstract
The extraction of critical information from crime-related documents is a crucial task for law enforcement agencies. Named-Entity Recognition (NER) can perform this task in extracting information about the crime, the criminal, or law enforcement agencies involved. However, there is a considerable lack of adequately annotated data on general real-world crime scenarios. To address this issue, we present CrimeNER, a case-study of Crime-related zero- and Few-Shot NER, and a general Crime-related Named-Entity Recognition database (CrimeNERdb) consisting of more than 1.5k annotated documents for the NER task extracted from public reports on terrorist attacks and the U.S. Department of Justice's press notes. We define 5 types of coarse crime entity and a total of 22 types of fine-grained entity. We address the quality of the case-study and the annotated data with experiments on Zero and Few-Shot settings with State-of-the-Art NER models as well as generalist and commonly used Large Language Models.