Machine Learning Singapore : 25-Sept-2025 : LLMs in production @ Rakuten

## LLMs in production @ Rakuten
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
 
[Sam Witteveen](http://samwitteveen.com) @ [reddragon.ai](http://reddragon.ai/)

25-September-2025

---

## Today's Line-up

* "Efficient LLM Fine-tuning for Semantic Search" - _Dongzhe Wang_
* "IPhO Gold using Agentic Gemini" - _Martin Andrews_
* "Efficient Inference and Serving of LLMs and Large Video-Generative Models" - _Jonathan Zhao_

---

## Efficient LLM Fine-tuning for Semantic Search
#### Dongzhe Wang

* What is Semantic Search?
* Model architectures
* How to fine-tune a semantic search model

---

## IPhO Gold using Agentic Gemini
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)

; $x^2=17_i$ $
; ## DSPy

25-September-2025

---

## About Me

* Machine Intelligence / Startups / Finance  
  + Moved from NYC to Singapore in Sep-2013

* 2014 = 'fun' :
  + Machine Learning, Deep Learning, NLP
  + Robots, drones

* Since 2015 = 'serious' :: NLP + deep learning
  + Including Papers...
  + & GDE ML; ML-Singapore co-organiser...
  + & Red Dragon AI...

## About Red Dragon AI

* Deep Learning Consulting & Prototyping (Google Partner)
  - Education / Training
  - Research : NeurIPS / EMNLP / NAACL / ICML / ICLR

* Please contact us for : 
  - Language model training (eg: on-prem)
  - Knowledgebase interaction & reasoning
  - Sales-oriented applications

---

## Outline

* Intro + What is the IPhO?
* Agentic Gemini for the IPhO
* Wrap-up & QR-code

;* Head's Up!

---

## The Paper

* ["Physics Supernova": AI Agent Matches Elite Gold Medalists at IPhO 2025](https://arxiv.org/abs/2509.01659) - Qiu _et al._ (2025)

---

## What is the IPhO?

* ["International Physics Olympiad"](https://ipho-unofficial.org/)
  + qv: IMO = International Maths Olympiad
* Every year [since 1967](https://ipho-unofficial.org/timeline/):
  + Teams of [5](https://www.bpho.org.uk/IPhO/) from all competing countries
    - Pre-university students
  + Meet in a city for multi-day competition
    - Tricky physics problems...

## Look at [Qs from 2025](https://ipho.olimpicos.net/)

* 3 Theory questions (5hrs)
  + Hydrogen and galaxies
  + Cox's Timepiece
  + Champagne! (held in France)
* 2 Experimental Qs (not attempted by AI)
  + Physical equipment (x2 in 2025)
  + Or by computer simulation
    - Potentially possible for Agentic system...

## Hydrogen & Galaxies

[<img height="420" src="img/IPhO-2025-Q1_HydrogenAndGalaxies_998x639.png" alt="IPhO 2025 Q1">](https://ipho.olimpicos.net/pdf/IPhO_2025_Q1.pdf)

## Cox's Timepiece

[<img height="420" src="img/IPhO-2025-Q2_Coxs-timepiece_1221x692.png" alt="IPhO 2025 Q2">](https://ipho.olimpicos.net/pdf/IPhO_2025_Q2.pdf)

## Champagne!

[<img height="420" src="img/IPhO-2025-Q3_Champagne_994x577.png" alt="IPhO 2025 Q3">](https://ipho.olimpicos.net/pdf/IPhO_2025_Q3.pdf)

## Metrics

* IPhO questions have a fixed rubric / scoring scheme
  + Paper team included a previous IPhO competitor & *marker*
* It's pretty easy to see whether the answers were right...
  + (like the IMO) answers are ~'tricks-based'

---

## The Paper

* ["Physics Supernova": AI Agent Matches Elite Gold Medalists at IPhO 2025](https://arxiv.org/abs/2509.01659) - Qiu _et al._ (2025)
 + 'Agentic' Gemini 2.5 Pro 
 - experiments run 5 times for each question
 + [Code Repo](https://github.com/CharlesQ9/Physics-Supernova) (MIT)
 + [Last Author Thread](https://x.com/MengdiWang10/status/1965471580647293078)
 - ~ "Without Experimental Questions ⇒ Workshop paper!"

## Agentic System Diagram

* A [`smolagents` CodeAgent](https://smolagents.org/docs/agents-guided-tour/) Reason-Act loop
  - with tools : AnswerReviewer + ImageAnalyzer
  - `max_steps=80`, `max_completion_tokens=32768`
;  - and 3 retries (to cope with slip-ups) ...

## [Main](https://github.com/CharlesQ9/Physics-Supernova/blob/main/run.py) Agent Loop

```py
PROBLEM_SOLVING_PROMPT = (
  "Your task is to solve the problem part by part, 
  step by step. ONLY after you have FISHED the WHOLE PROBLEM 
  should call `final_answer`, 
  never call `final_answer` when there are parts left!"
)
...
IMG_TOOL_PROMPT = "When you need to perform measurements on 
  images, you MUST call the `ask_image_question` tool. 
  EVERYTIME you MEASURE from some FIGURE, e.g., reading 
  numbers, getting readings of items on figures, 
  you MUST call the `ask_image_question` tool with the image 
  reference and your question, 
  or you might get very wrong measurements!"
```
;PROBLEM STATEMENT= text-with-image placeholders` + problem_text

### [Image Analysis](https://github.com/CharlesQ9/Physics-Supernova/blob/main/utils/imgTools.py) (Agentic Tool)

```py
messages = [
  ChatMessage ( role=MessageRole.SYSTEM, content=
    "You are an expert in dealing with image 
     in Physics Olympiads."),
  ChatMessage ( role=MessageRole.USER, content=[
    { " type " : " image " , " image " : img_file },
    { " type " : " text " , " text " : question },
  ]),
]
output : str = vision_expert_llm.generate( messages )
```

### [Answer Reviewer](https://github.com/CharlesQ9/Physics-Supernova/blob/main/utils/reviewTools.py) (Agentic Tool)

```md
You are an uncompromising Physics peer-reviewer. 
Your job is to find *every* logical, mathematical error 
in the worker's answer.
Check dimensional consistency, missing steps, incorrect 
sign conventions, numerical mistakes, and unclear explanations. 
Focus especially on wrong answers, less on presentations.
Be extremely critical : if something is wrong, 
point it out and request clarification or correction.
...
Also, if the worker reads measurements from image, 
make sure to remind the worker that whenever it reads 
or measures from image, it uses the `ask_image_expert` tool, 
or the readings might be very inaccurate.
```
;Mainly focus on errors that would lead to a wrong result, 
;rather than focusing extremely on presentation or style.
;It is possible that the worker's answer is not correct, 
;so please be prepared to provide detailed feedback. 
;The worker's answer contains some error, so you must check and point it out.

---

## Qualitative Analysis

## Quantitative Results

## Overall Results

* System scored 23.5/30 = 14th among 406 contestants!
* Was it over-fit on the 2025 questions?
  + "The real situation is that our budget is just 200 USD
    which is impossible to support us to run several experiments to 
    optimize the prompt or cherry-pick results."
* Gemini Pro 2.5 (regular version) could score (low) Gold 
  + Agentic system boosted capabilities significantly

## Observations

* Host country affects *style* of questions
  + UK Qs were more like proofs and generalisation
  + German Qs were more like engineering
  + France Qs seem like hand-holding, but long path 
    - ... this may have suited Gemini 
* Simulation experiments may become possible for Agents

---

## Wrap-Up

* Gemini Pro is Excellent!
  + Gets a huge boost from Agentic framework
  + ( Exam style may have been helpful )
* Only a limited budget was required
* Won't deter anyone from entering the IPhO ...

NB: MLSG wants to feature Your Talk! (Say "Hello"...)

## Link to Slides

[<img width="300" src="img/bit.ly_MLSG_2025-09_656x656.png" alt="Agentic Gemini for IPhO QR code"/>](https://bit.ly/MLSG_2025-09)

[https://bit.ly/MLSG_2025-09](https://bit.ly/MLSG_2025-09)

---

### Efficient Inference and Serving of LLMs and Large Video-Generative Models
#### Jonathan Zhao

* Efficient Inference & Serving of LLMs
* Efficiency for vision models

---

## Further Study

* Field is growing very rapidly
* Lots of different things can be done
* Easy to find novel methods / applications

## Deep Learning Foundations

* 3 week-days + online content
* Play with real models & Pick-a-Project
* Held online, Live Coding, Certificates
* Next run : TBA

## NLP (Advanced)
### Advanced NLP and Sequence Processing

* NLP (eg: Named Entity Recognition)
* Transformers : Theory and Practice
* Generative AI
* Next run : TBA

## Vision (Advanced)
### Advanced Computer Vision with Deep Learning

* Advanced classification
* Other architectures (eg: U-Nets)
* Transformer-based vision
* Next run : 7, 8, 9 October

## Deep Learning for PMs
### ( `= Foundations - code` `+ management` )
* Much more about 'big picture'
* Only a few code examples
* Project process standardised
* Next run : 21, 22, 23 October

## AI in Production
### Building Real World A.I. Applications

* DIY : node-server + task-queue + python-ml
* TensorFlow Serving / PyTorch Serve
* TF Lite + TF.js : edge device models
* Distillation, pruning, quantisation, etc...
* Next run : 3, 4, 5 November

## Also...

* Unsupervised methods
* Time-series & Deep Learning
* Audio Processing (Sounds & Speech)

;--
;
;## QR code for Courses
;
;<img height="330" src="img/RDAI-courses-QRcode_172x165.png" alt="RDAI Courses QR code"/>

---

## Machine Learning SG MeetUp Group
* Next Meeting = 15-Oct-2025 @ Google
* Topic(s) : TBA
* Typical Contents : 
 + Talk for people starting out
 + Something from the bleeding-edge
 + Lightning Talks
* [MeetUp.com / Machine-Learning-Singapore](https://www.meetup.com/Machine-Learning-Singapore/)

## Quick Poll
#### Show of hands

* How did you hear about THIS event?
  + MeetUp email
  + luma.com email
  + Messaging group
  + MLSG friends directly
  + Work colleagues

## Quick Poll
#### Show of hands

* How do you feel about MeetUp vs Luma?
  + luma is better
  + MeetUp is better
  + Don't really care

;--
;
;## Quick Poll
;#### Show of hands
;
;* What topic(s) would _compel_ you to come?
;  + Stable-diffusion++ / Video / Gaussian Splatting
;  + Robotics
;  + Reinforcement Learning
;  + AI for Education
;  + LLMs for Science
;  + Agents

---

## THANK YOU!

* Venue: 
  + Rakuten
* MLSG Voluteers:
  + Shern; Nicholas; Geoffrey; Anthony; Leonard; Malik
* MLSG Helpers:
  + Jen; JF

---

# See You Next Time !

Please add yourself to the MLSG Calendar on Luma!

;`Handouts :` [`https://bit.ly/` `text-similarity-jan-2022`](https://bit.ly/text-similarity-jan-2022)