HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions
2026-06-22 • Information Retrieval
Information RetrievalComputation and Language
AI summaryⓘ
The authors created HAKARI-Bench, a smaller and easier benchmark that helps compare different retrieval methods quickly and fairly. It combines many existing tests into tiny datasets covering many languages and tasks, allowing for fair comparison of popular retrieval models like BM25 and dense embeddings. Their results closely match bigger, more complex benchmarks but run faster, making it useful for choosing and improving models during development. They shared the code and data publicly under an open license.
retrieval-augmented generationsemantic searchembeddingBM25dense retrievalsparce retrievalbenchmarkdimensionality reductionrerankingSpearman correlation
Authors
Yuichi Tateno
Abstract
With the rapid spread of retrieval-augmented generation and semantic search, choosing the right embedding and retrieval configuration is increasingly hard. Large retrieval benchmarks are comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction, quantization, reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition, model-agnostic comparison of five retrieval families (BM25, dense, sparse, late interaction, rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the official MTEB retrieval v2, MMTEB v2 retrieval, and English BEIR (full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiency Pareto frontier. Code, data, and leaderboard are released under the MIT license.