Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.
Poster version - Presented at the NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models, in San Diego, California, USA
Full Workshop Paper - the NeurIPS 2025 paper accepted for the Workshop on Multi-Turn Interactions in Large Language Models
@misc{kalyan2025multiturnagenticrag,
title={{Reinforcement Learning} for Long-Horizon Multi-Turn Search Agents},
author={Vivek Kalyan and Martin Andrews},
year={2025},
eprint={2510.24126},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.24126},
}
Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU architectures where traditional development aids are scarce. This paper introduces an LLM-powered "GPU Kernel Scientist," an automated methodology for iteratively refining accelerator kernels. Our methodology employs LLMs in a multi-stage, evolutionary process: (a) strategically selecting promising prior code versions as a basis for new iterations; (b) generating hypotheses for optimization experiments, based on existing code and assimilated knowledge from general GPU literature; and (c) autonomously implementing these experiments through code modification and subsequent submission to an external evaluation system, using only observed timing data as performance feedback. We detail how this approach navigates the challenges of the AMD MI300 target architecture and leverages LLMs to compensate for limited domain-specific human expertise. Since quantitative results from an ongoing performance competition were embargoed on paper submission date, we present the architectural design, operational workflow, and qualitative insights, highlighting the potential of LLM-driven agents to democratise and accelerate GPU kernel optimization, especially in resource-constrained or rapidly evolving hardware environments.
Poster version - Presented at the ES-FoMo III workshop at ICML 2025 in Vancouver, Canada
Full Workshop Paper - the ICML 2025 paper accepted for the ES-FoMo III Workshop (updated, post-workshop, to reflect full results)
@misc{andrews2025gpukernelscientistllmdriven,
title={GPU Kernel Scientist:
An LLM-Driven Framework for Iterative Kernel Optimization},
author={Martin Andrews and Sam Witteveen},
year={2025},
eprint={2506.20807},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.20807},
}
Relational reasoning is a central component of intelligent behavior, but has proven difficult for neural networks to learn. The Relation Network (RN) module was recently proposed by DeepMind to solve such problems, and demonstrated state-of-the-art results on a number of datasets. However, the RN module scales quadratically in the size of the input, since it calculates relationship factors between every patch in the visual field, including those that do not correspond to entities. In this paper, we describe an architecture that enables relationships to be determined from a stream of entities obtained by an attention mechanism over the input field. The model is trained end-to-end, and demonstrates equivalent performance with greater interpretability while requiring only a fraction of the model parameters of the original RN module.
Poster version - Presented at the ViGiL workshop at NIPS 2017 in Long Beach, California, USA
Full Workshop Paper - the NIPS 2017 paper accepted for the ViGiL workshop
@misc{andrews2019relationships,
title={Relationships from Entity Stream},
author={Martin Andrews and Sam Witteveen},
year={2019},
eprint={1909.03315},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using large-scale unlabelled text analysis. However, these representations typically consist of dense vectors that require a great deal of storage and cause the internal structure of the vector space to be opaque. A more idealized representation of a vocabulary would be both compact and readily interpretable. With this goal, this paper first shows that Lloyd's algorithm can compress the standard dense vector representation by a factor of 10 without much loss in performance. Then, using that compressed size as a storage budget, we describe a new GPU-friendly factorization procedure to obtain a representation which gains interpretability as a side-effect of being sparse and non-negative in each encoding dimension. Word similarity and word-analogy tests are used to demonstrate the effectiveness of the compressed representations obtained.
Poster version - Presented at ICONIP-2016 in Kyoto, Japan
Full Paper - in ICONIP-2016 proceedings
@Inbook{Andrews2016-CompressingWordEmbeddings,
author="Andrews, Martin",
editor="Hirose, Akira and Ozawa, Seiichi and Doya, Kenji
and Ikeda, Kazushi and Lee, Minho and Liu, Derong",
title="Compressing Word Embeddings",
bookTitle="Neural Information Processing:
23rd International Conference, ICONIP 2016, Kyoto, Japan,
October 16--21, 2016, Proceedings, Part IV",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="413--422",
isbn="978-3-319-46681-1",
doi="10.1007/978-3-319-46681-1_50",
url="http://dx.doi.org/10.1007/978-3-319-46681-1_50"
}
Named Entity Recognition (NER) is a foundational technology for systems designed to process Natural Language documents. However, many existing state-of-the-art systems are difficult to integrate into commercial settings (due their monolithic construction, licensing constraints, or need for corpuses, for example). In this work, a new NER system is described that uses the output of existing systems over large corpuses as its training set, ultimately enabling labelling with (i) better F1 scores; (ii) higher labelling speeds; and (iii) no further dependence on the external software.
'Lite' version - Poster Prize winner at Nvidia ASEAN GPU conference
Full Paper - Presented at IES-2015 in Bangkok, Thailand
@Inbook{Andrews2016-NERfromExperts,
author="Andrews, Martin",
editor="Lavangnananda, Kittichai and Phon-Amnuaisuk, Somnuk
and Engchuan, Worrawat and Chan, Jonathan H.",
title="Named Entity Recognition Through Learning from Experts",
bookTitle="Intelligent and Evolutionary Systems:
The 19th Asia Paciļ¬c Symposium, IES 2015, Bangkok, Thailand,
November 2015, Proceedings",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="281--292",
isbn="978-3-319-27000-5",
doi="10.1007/978-3-319-27000-5_23",
url="http://dx.doi.org/10.1007/978-3-319-27000-5_23"
}