Generated, Parallel, Scalable? A Study of Agentic AI-Generated Julia Code on Supercomputers
2026-06-15 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster Computing
AI summaryⓘ
The authors explored how AI language models can independently create parallel computing code in the Julia programming language, which is used in high-performance computing. They tested three different AI models on tasks like approximating π and matrix operations using a Julia package called Dagger.jl. While the AI-generated code worked well on small tasks, it struggled and failed with bigger problems due to issues like deadlocks and memory errors. The study shows that AI can help write parallel Julia code, but making it reliable and efficient for large-scale computing still needs more work.
JuliaHigh-Performance Computing (HPC)Parallel ProgrammingLarge Language Models (LLMs)Dagger.jlBase.ThreadsMPI.jlTask-based ExecutionMatrix MultiplicationCholesky Decomposition
Authors
Linus Bantel, Anna-Lena Roth, Jonas Posner, Dirk Pflüger
Abstract
Julia is increasingly used in hpc as a single-language alternative to combining high-level scripting with low-level systems languages, but achieving scalable performance still requires expertise in parallel programming. llms are increasingly used for code generation and are advancing rapidly with each new version. Yet, existing studies focus on single-shot prompting rather than agentic settings, in which an llm autonomously plans, generates, and refines code through tool use. Using an OpenCode-based agent extended with a Julia-documentation mcp server, we study agentic generation of parallel Julia code, focusing on task-based execution with Dagger.jl. We evaluate three llms, OpenAI GPT-5.5, Anthropic Claude Opus 4.7, and the open-weight Qwen3-Coder-Next, on three problems with distinct parallel structures: π approximation, tiled general matrix multiplication, and tiled Cholesky decomposition. The generated Dagger.jl implementations are compared against agent-generated Base.Threads and MPI.jl baselines, with shared-memory experiments scaling to 192 cores and distributed-memory experiments on two nodes. The agents reliably produce executable code for small inputs but fail at larger scales due to deadlocks, oversubscription, or out-of-memory errors, with the open-weight model affected most severely. The two commercial models scale comparably on Base.Threads and MPI.jl, while their Dagger.jl implementations expose recurring weaknesses in task dependencies, granularity, and scheduling. Agentic AI is promising for producing parallel Julia code, but generating robust, performance-aware implementations for large-scale hpc systems remains an open challenge.