Consensus-based Agentic Large Language Model Framework for Harmonized Tariff Schedule Code Classification

2026-06-15Artificial Intelligence

Artificial Intelligence
AI summary

The authors developed a system using large language models to help classify products for Canadian customs codes in maritime shipping. They combined multiple smart steps like searching official documents, checking agreement between findings, and asking humans for help when uncertain. Testing on real logistics data showed that accurately assigning detailed 10-digit codes remains tough, especially for very specific parts. Their work suggests a mix of machine help and human review is better than relying entirely on AI. This approach aims to make customs classification clearer and more reliable in port operations.

Harmonized Tariff Schedule (HTS)large language model (LLM)customs classificationmaritime logisticsinformation retrievalhierarchical codesevidence-grounded reasoninghuman-in-the-loopconfidence estimationsmart port
Authors
Truong Thanh Hung Nguyen, Khanh Van Quynh Nguyen, Hoang-Loc Cao, Tri Duong, Phuc Ho, Van Pham, Loc Nguyen, Hung Cao
Abstract
Accurate Harmonized Tariff Schedule (HTS) code classification is essential for customs clearance, duty assessment, trade statistics, and regulatory compliance in maritime logistics. However, exact HTS classification remains challenging because product descriptions are often short, incomplete, or ambiguous, while correct classification depends on hierarchical tariff structures, legal notes, and jurisdiction-specific rules. This paper proposes an agentic large language model (LLM) framework for Canadian 10-digit HTS code classification in smart-port and maritime logistics environments. The framework integrates multi-agent information retrieval, semantic retrieval over official tariff documents, evidence-grounded reasoning, consensus-based validation, element-wise voting across hierarchical code components, confidence estimation, and human-in-the-loop escalation. We evaluate the framework on a private dataset of 3,300 domain-expert-labeled product records collected from logistics and delivery contexts. Experimental results show that exact 10-digit classification remains difficult even for advanced LLMs, with performance decreasing from coarse chapter-level prediction to fine-grained tariff and statistical suffix assignment. These findings demonstrate the need for evidence-grounded, uncertainty-aware, and human-centered classification workflows rather than fully autonomous single-step prediction. The proposed framework supports more interpretable, accountable, and compliance-oriented HTS classification for maritime logistics and smart-port operations. Our code is available at https://github.com/Analytics-Everywhere-Lab/hts.