Llamion Technical Report

2026-05-25Computation and Language

Computation and Language
AI summary

The authors introduce Llamion, a set of large language models made by converting an existing model called Orion-14B into a new architecture. They use a technique named KEPT, which involves smart ways to map parameters and teach the new model to behave like the original without extensive retraining. Llamion matches Orion’s performance on several tests after only a short training period, and it keeps important abilities like coding and understanding long texts even though those were not part of the retraining data. The authors provide multiple versions of Llamion that can be used easily with popular tools.

Llama architecturelanguage modelsparameter mappingknowledge distillationLayerNormRMSNormweight decaycross-architecture transferKoMMLUHugging Face
Authors
Kisu Yang, Yoonna Jang, Hyeonseok Moon, Hwanseok Jang, Taewoo Lee, Hyungjin Lee, Jeseung Lee, Juhyoung Park, Heuiseok Lim
Abstract
We release Llamion, a family of 14B-parameter open-weight language models obtained by transforming Orion-14B into the standardized Llama-family architecture. The transformation is performed by Efficient Knowledge Preservation for Transformation (KEPT), a recipe that combines (i) Normal Parameter Mapping (NPM) for unchanged modules, (ii) Optimized Parameter Mapping (OPM), a training-free LayerNorm-to-RMSNorm initialization we prove optimal under the near-zero-mean activation regime induced by weight decay, and (iii) Cross-architecture Knowledge Distillation (XKD), an equal-size frozen-teacher distillation that aligns the converted model's outputs with the source model's on any reasonable input distribution. Llamion recovers Orion's behaviour on H6, MT-Bench, and KoMMLU with only ~123M tokens on a single A100 in four days; Llamion-Base reaches 66.87% on KoMMLU, exceeding the next-best entry of the Open Ko LLM Leaderboard by >7.0 absolute points at submission time. Capabilities entirely absent from the transfer corpus (Python programming and 200K-token context handling) survive the architectural transition intact. We release three checkpoints (Base, Chat, LongChat) that load with trust_remote_code=False in the Hugging Face Transformers library.