Even though large language models are becoming increasingly capable, it is still unreasonable to expect them to excel at tasks that are under-represented on the Internet. Leveraging LLMs for specialized applications, particularly in niche programming languages and private domains, remains challenging and largely unsolved.
          
          
            In this work, we address this gap by presenting a comprehensive, open-source approach for adapting LLMs to the Q programming language, a popular tool in quantitative finance. We introduce a new LeetCode-style evaluation dataset for Q, benchmark major frontier models, then perform pretraining, supervised fine-tuning, and reinforcement learning to train a suite of models based on the Qwen-2.5 series, spanning five parameter sizes (1.5B, 3B, 7B, 14B, 32B).
          
          
            Our best model achieves a pass@1 accuracy of 59% on our Q benchmark, surpassing the best-performing frontier model, Claude Opus-4, by 29.5%. Additionally, all our models, even our 1.5B variant, outperform GPT-4.1 on this task. We provide a detailed blueprint for dataset construction, model pretraining, supervised fine-tuning, and reinforcement learning that is broadly applicable to other specialized domains.