AI Training Manager: Bounded Closed-Loop Control of Adaptive Training Recipes

2026-06-29 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors created an AI Training Manager that acts like a smart supervisor during machine learning training. Instead of replacing existing methods, it watches the training process using structured data and suggests safe adjustments like changing learning rates or regularization to fix problems like overfitting or bad exploration. They tested it on language and robot learning tasks, finding it helps improve performance and can work without slowing down training. This shows that large language models can help manage training in a clear and controllable way alongside traditional methods.

Large Language ModelsMachine Learning TrainingOverfittingRegularizationLearning RateReinforcement LearningSupervised LearningExplorationSchedulerTelemetry

Authors

Anjali Rao, Nikhil Kamalkumar Advani

Abstract

We present the AI Training Manager, a bounded LLM-based supervisory controller for adaptive machine learning training. Standard training pipelines often rely on fixed recipes or single-axis schedulers, which can struggle with mid-run failures such as severe overfitting, loss imbalance, exploration collapse, or unsafe exploration. Rather than replacing mathematical optimizers or acting as an unconstrained coding agent, the manager operates through a schema-conditioned interface: it reads structured telemetry snapshots from an active run, audits a constrained action space, and returns validated updates to training parameters such as learning rate, regularization strength, loss-weight coefficients, and exploration settings. We evaluate this architecture across supervised language modeling and reinforcement learning. On TinyStories, the manager detects and corrects overfitting, achieving a validation loss 60% lower than the baseline while producing auditable intervention logs. In this supervised setting, we additionally show that manager inference does not need to block the training loop: training can continue while a manager response is pending, and validated updates can be applied asynchronously once available. In a robotic manipulation reinforcement-learning task, we use the same bounded decision interface in an episodic closed-loop setting, where manager updates are applied at evaluation or checkpoint boundaries. The manager mitigates both conservative and unsafe exploration regimes. These results suggest that schema-conditioned LLMs can serve as bounded supervisory managers for live training runs, complementing conventional optimizers and schedulers with interpretable, multi-axis intervention capabilities

View PDFOpen arXiv