Machine Learning Singapore : 26-November-2025 : Evolving Reasoning

## Gemini 3 !
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
 
[Sam Witteveen](http://samwitteveen.com) @ [reddragon.ai](http://reddragon.ai/)

26-November-2025

---

## Today's Line-up

* "SAM 3/3D; Evolving Reasoning" - _Martin Andrews_
* "Memento: Addressing the Most Undervalued Area of Agent Research?" - _Nicholas Chen_
* "New Gemini, New Banana & New Agents" - _Sam Witteveen_

---

;## SAM 3/3D & Evolving Reasoning
## Evolving Reasoning & SAM 3/3D
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)

; $x^2=17_i$ $
; ## DSPy

26-November-2025

---

## About Me

* Machine Intelligence / Startups / Finance  
  + Moved from NYC to Singapore in Sep-2013

* 2014 = 'fun' :
  + Machine Learning, Deep Learning, NLP
  + Robots, drones

* Since 2015 = 'serious' :: NLP + deep learning
  + Including Papers...
  + & GDE ML; ML-Singapore co-organiser...
  + & Red Dragon AI...

## About Red Dragon AI

* Deep Learning Consulting & Prototyping (Google Partner)
  - Education / Training
  - Research : NeurIPS / ICML / ICLR / NAACL / EMNLP

* Please contact us for : 
  - Language model training (eg: on-prem)
  - Knowledgebase interaction & reasoning
  - Sales-oriented applications

---

## Outline

* Evolving Reasoning [Level: 300]
  + Current issues with RL
  + Some 'normal' fixes
  + Evolutionary approaches
* SAM 3 [Level: 100]
  + Segmentation
  + 3D-Objects
  + 3D-Body
* Wrap-up & QR-code

;* Introspection
;* Head's Up!

---

### Reinforcement Learning

* Reasoning with Verifiable Rewards
  + some issues...
  + effectiveness of LoRA
* On-Policy Distillation
* Evolutionary approaches

;* Sampling

---

### Aha! Moment

* 20-Jan-2025 : ["DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"](https://arxiv.org/abs/2501.12948)
  + 22-Jan-2025 : [MLSG MeetUp talk](https://mdda.net/blog/research/talks/NeurIPS-recap-and-SOTA)
* Suddenly Open Source could:
  + see how 'easy' OpenAI's `o1` training could be
  + implement reasoning pipelines (v. exciting)

## R1-Zero : Prompt
#### *Slides from Jan-2025*

;  [1-2|3|4-5|6-7]
;The user asks a question, and the Assistant solves it.
;The assistant first thinks about the reasoning process in the mind 
;and then provides the user with the answer.

```html
A conversation between User and Assistant. 
...
The reasoning process and answer are enclosed within 
<think> </think> and <answer> </answer> tags, respectively, 
i.e., <think> reasoning process here </think> 
 <answer> answer here </answer>
* User: {prompt}. 
* Assistant:
```

* Very plain-spoken.  No examples.  No 'help'
* Reward signal = Is the answer correct : Y/N
* RL process : GRPO (i.e. ~DPO for groups) rather than PPO
;  + Simple due to compute constraints?

;The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, 
; - i.e., <think> reasoning process here </think> <answer> answer here </answer>.

## R1-Zero : Learning

* Surprise : Lift-off achieved without any extras

## R1-Zero : Aha!

* Process self-referral : "... aha moment **I** can flag"
  + BUT : might switch language; or loop

---

### What have we learned since January?

* Training "On-Policy" is very different
  + Pre-training $\ne$ Roll-outs
* RLVR (RL Verified Rewards) works reasonable well
  + unintuitive: roll-out cost $\gg$ GRPO backprop
* Niggles (with some fixes...):
  + RL doesn't teach new stuff 
    - choice of (small) model matters
  + $\mathcal{O}(1)$ bit of data per roll-out
  + Not clear what good tasks are
    - so that 'reasoning' generalises

;  + Learning strategies is key point

---

## Do models discover reasoning through RL?

* [Reasoning with Sampling: Your Base Model is Smarter Than You Think](https://arxiv.org/abs/2510.14901) - Karan & Du (2025)
  + Base models already know about reasoning 
    - Just need to sample a lot
      * this is a smarter way to sample
      * i.e. just find higher overall probablity roll-outs
  * [Project Page](https://aakaran.github.io/reasoning_with_sampling/) & [Code Repo](https://github.com/aakaran/reasoning-with-sampling)
    + Training-free 
    + Sharpened distributions
      - simple iterative sampling algorithm using base models' own likelihoods

;  * [Author Thread](https://x.com/aakaran31/status/1979194052697280712)
;    + Has animations, etc

;    + "interesting how you got away with N_{MCMC} = 10 using only 8x more compute. mixing time should be ~logarithmic in # sequences x (so around your block size T/B)"
;   + "Low-temp/variants with beam search on the base model get some boost over the base model but not as much as GRPO/power sampling. "

### Reasoning with Sampling
;#### Results

* In-domain (MATH500):
  + power sampling close to GRPO 
    - without ever changing the base model's weights
* Out-of-domain (HumanEval and AlpacaEval 2.0)
  + power sampling actually outperforms GRPO

---

## Do models learn reasoning through RL?

* [Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) - Yue _et al_ (2025)
  + [Project Page](https://limit-of-rlvr.github.io/)
  + Key points:
    - Base models surpass RL at large pass@k
    - RL narrows exploration
      + This is a silent performance-killer
    - Current RL algorithms plateau
  + TL;DR : Not really

## Limits of RLVR

* RL might make model more efficient
  - at the expense of reducing search

---

## Low information per roll-out

* [LoRA Without Regret](https://thinkingmachines.ai/blog/lora/) blog post
  + John Schulman _et al_ @ Thinking Machines 
  + LoRA is actually a good match for RLVR
    - (particularly Policy-Gradient methods)
    - even $R=1$ can be effective!
      * idea : reasoning is about strategy rather than knowledge

---

## On-Policy Distillation
#### Learning more per roll-out

* [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) blog post
  + Kevin Lu _et al_ @ Thinking Machines 
  + includes pseudo-code
* Basic ideas:
  + small models learn badly off-policy
    - but that's what Supervised Fine-tuning is!
  + Policy-Gradient methods only use $\mathcal{O}(1)$ bit of data per roll-out
    - so they're inefficient (although on-policy for student)
  + get teacher logits for student rollout
    - much denser information, and on-policy for student

## Off-policy vs On-policy

* Regular Fine-tuning / Distillation:
<img width="850" src="img/OnPolicyDistillation_SFT_1663x275.png" alt="Regular Fine-tuning">

* On-Policy Distillation:
<img width="650" src="img/OnPolicyDistillation_StudentRollout_1530x400.png" alt="Student Rollout">

## On-Policy Distillation
#### Take-aways

* Much more efficient
  + can do small-model roll-outs
  + get teacher feedback 'in parallel mode'
* Denser feedback about student roll-outs

---

## Evolution Strategies (2017)

* [Deep neuroevolution: Genetic Algorithms are a competitive alternative for training DNNs for RL](https://arxiv.org/abs/1712.06567) - Such _et al_ (2017)
  + authors included: Joel Lehman, Kenneth Stanley, Jeff Clune (@Uber)
  + Key ideas:
    * cannot do evolution on individual weights
    * single random seed $\rightarrow$ full matrix
      - low communication cost
    * can pick+choose which seeds using evolution
  + Results (on Atari):
    * competitive agents evolve in tens of generations
    *(population size ~1k individuals

---

## Evolving Reasoning (v1)

* [Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning](https://arxiv.org/abs/2509.24372) - Qiu _et al_ (2025)
  + [Official Repo](https://github.com/VsonicV/es-fine-tuning-paper) (Commercial license required)
  + Full-matrix Evolution Strategies (using random seeds)
  + Idea: Optimising in parameter space, not roll-out space
;  + [Author Thread](https://x.com/yule_gan/status/1975177775251087436)

## Evolving Reasoning (v2)

;  low-rank matrix perturbation

* [Evolution Strategies at the Hyperscale](https://arxiv.org/abs/2511.16652) - Sarkar _et al_ (2025)
  + "EGGROLL"
    - **E**volution **G**uided **G**eneral **O**ptimization via **L**ow-rank **L**earning
    - basically add many small LoRAs to the full matrices
      * as good as full ES (but much faster - can compute on-the-fly)
    - competitive with GRPO
  + [Project Page](https://eshyperscale.github.io/)

;  * [Author Thread](https://x.com/bidiptas13/status/1992706291547127860)
;    + Author [website](https://bsarkar321.github.io/)
;    + Author created a [JAX RWKV](https://github.com/bsarkar321/jaxrwkv)
;  * [bycloud = support Thread](https://x.com/bycloudai/status/1992927982818836947)

## EGGROLL `int8` training

* Scaled 8-bit LLM training: 
 + EGGROLL can work on integer weights
 - [Code Repo](https://github.com/ESHyperscale/nano-egg) (GPL-3)
 + apply Evolutionary Strategies to [RWKV-7](https://www.rwkv.com/)
 - two reasoning tasks : Countdown & GSM8k
 
<img width="800" src="img/EGGROLL_RWKV-reasoning_1460x480.png" alt="EGGROLL reasoning">

---

## Reasoning Wrap-up

* GRPO / PPO etc are *seductive* but may be wrong track
  + Perhaps we haven't yet explored RL fully
* "Evolution Strategies" ...
  + ... just scratching surface of Evolution ideas
* Interesting Kaggle competition : 
  + [Gemma.small general reasoner](https://www.kaggle.com/competitions/google-tunix-hackathon) using Tunix / TPUs

---

## SAM-3
#### All from Meta

* SAM-3 itself
  + DEMO
* SAM-3 extensions:
  + SAM-3D
  + SAM-3D-Bodies
  + DEMO

---

## SAM-3 itself

* ["SAM 3: Segment Anything with Concepts"](https://ai.meta.com/research/publications/sam-3-segment-anything-with-concepts/) - Carion _et al_ (2025)
 + [SAM 3 Project Page](https://ai.meta.com/sam3/) & [Blog Post](https://ai.meta.com/blog/segment-anything-model-3/)
 + Builds on ["Perception Encoder: The best visual embeddings are not at the output of the network"](https://arxiv.org/abs/2504.13181) - Bolya _et al_ (2025)
 * Builds on ["SAM 2: Segment Anything in Images and Videos"](https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/) - Ravi _et al_ (2024)
 - [SAM 2 Project Page](https://ai.meta.com/sam2/)
 + [Code Repo](https://github.com/facebookresearch/sam3) ("open" license)
 - Models need HF approval
 - [Jupyter widget Colab](https://github.com/facebookresearch/sam3/blob/main/examples/sam3_image_interactive.ipynb)

## SAM-3 evolution

## SAM-3 architecture

## SAM-3 demo

* https://aidemos.meta.com/segment-anything/gallery
  + [Image Segmentation](https://aidemos.meta.com/segment-anything/editor/segment-image)
    - try [Luggage layout image](https://aidemos.meta.com/segment-anything/editor/segment-image/?media_id=1747347625966823)
  + [Video Segmentation](https://aidemos.meta.com/segment-anything/editor/segment-video)
    - try [traffic video](https://aidemos.meta.com/segment-anything/editor/segment-video/?media_id=3846378595584619)

---

## SAM-3D Objects

* ["SAM 3D: 3Dfy Anything in Images"](https://ai.meta.com/research/publications/sam-3d-3dfy-anything-in-images/) - 
  + SAM 3D [Project Page](https://ai.meta.com/sam3d/) & [Blog Post](https://ai.meta.com/blog/sam-3d/)
    - [Repo](https://github.com/facebookresearch/sam-3d-objects) ("open" license)
      * Has Colabs for 3D object export to Gaussian Splat
;    - [Architecture png](https://github.com/facebookresearch/sam-3d-objects/blob/main/doc/arch.png)

## SAM-3D Object Demo

* https://aidemos.meta.com/segment-anything/gallery
  + [Create 3D scenes](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) : Notice 64x64x64 voxels, followed by refinement
    - try [Void Deck](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) "Upload"
    - try [Merlion](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) "Upload"

---

## SAM-3D Bodies

* ["SAM 3D Body: Robust Full-Body Human Mesh Recovery"](https://ai.meta.com/research/publications/sam-3d-body-robust-full-body-human-mesh-recovery/) - Yang _et al_ (2025)
 + Builds on:
 * [ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling](https://arxiv.org/abs/2508.15767) - Park _et al_ (2025)
 * Meta [MHR: Momentum Human Rig](https://arxiv.org/abs/2511.15586) - Ferguson _et al_ (2025)
 - "Open" models released (requires HF approval)
 + [Repo](https://github.com/facebookresearch/sam-3d-body) ("open" license)

## ATLAS 
#### Pose Structure

* ATLAS: skin is offset vs Skeleton 
  + better foundation than SMPL-X

### MHR: Momentum Human Rig
;#### Identity Space

* Includes multiple Levels of Detail (LOD)
  - 73639, 18439, 10661, 4899, 2461, 971 and 595 vertices
* Far better license than SMPL / SMPL-X

## SAM-3D Bodies
#### Demo

* https://aidemos.meta.com/segment-anything/gallery
  + [Create 3D bodies](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d)
    - try [Merlion](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d) "Upload"
    - try [Lau Pa Sat](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d) "Upload"

---

## Wrap-Up

* DIY Reasoning kicked off in January
  + Actually possible to do this at smaller scale!
  + Big labs are searching for ideas
* SAM 3 suite are incredible...
  + Not much else to say

NB: MLSG wants to feature Your Talk! (Say "Hello"...)

## Link to Slides

[<img width="300" src="img/bit.ly_MLSG_2025-11_656x656.png" alt="SAM 3/3D and Evolving Reasoning QR code"/>](https://bit.ly/MLSG_2025-11)

[https://bit.ly/MLSG_2025-11](https://bit.ly/MLSG_2025-11)

---

### Memento: Addressing the Most Undervalued Area of Agent Research?
#### Nicholas Chen

* Value of memory
* How to learn to retrieve
  - looking at the code

---

### New Gemini, New Banana & New Agents
#### Sam Witteveen

* Gemini 3
* Nano Banana Pro
* Agents:
  + Anti-gravity (fka Windsurf)
  + ...

---

## THANK YOU!

* Venue: 
  + Google
* MLSG Voluteers:
  + Jen; JF; Shern; Nicholas; Geoffrey; Anthony; Leonard; Malik;

## REFRAG: Rethinking RAG based Decoding
#### Xiaoqiang Lin **UPDATE**

* Advanced pre-training / training 
  + RAG-like decoding FTW!
  +  **appeared on Connor Shorten's [Weaviate Podcast](https://x.com/CShorten30/status/1985361515889803741)**

;  @CShorten30
;I am SUPER EXCITED to publish the 130th episode of the  featuring 
;* [Xiaoqiang Lin ( @xiaoqiang_98 ), the lead author of REFRAG from Meta Superintelligence Labs!]()

### [Ilya Sutskever on Dwarkesh Podcast](https://www.youtube.com/watch?v=aR20FWCCjAs)

[<img height="450" src="img/ScalingIsOver-and-LLMsAreADeadEnd_500x628.png"/>](https://x.com/KimNoel399/status/1993470777086427272)

---

## Further Study

* Field is growing very rapidly
* Lots of different things can be done
* Easy to find novel methods / applications

## Deep Learning Foundations

* 3 week-days + online content
* Play with real models & Pick-a-Project
* Held online, Live Coding, Certificates
* Next run : TBA

## NLP (Advanced)
### Advanced NLP and Sequence Processing

* NLP (eg: Named Entity Recognition)
* Transformers : Theory and Practice
* Generative AI
* Next run : TBA

## Vision (Advanced)
### Advanced Computer Vision with Deep Learning

* Advanced classification
* Other architectures (eg: U-Nets)
* Transformer-based vision
* Next run : TBA

## Deep Learning for PMs
### ( `= Foundations - code` `+ management` )
* Much more about 'big picture'
* Only a few code examples
* Project process standardised
* Next run : TBA

## AI in Production
### Building Real World A.I. Applications

* DIY : node-server + task-queue + python-ml
* TensorFlow Serving / PyTorch Serve
* TF Lite + TF.js : edge device models
* Distillation, pruning, quantisation, etc...
* Next run : TBA

## Also...

* Unsupervised methods
* Time-series & Deep Learning
* Audio Processing (Sounds & Speech)

;--
;
;## QR code for Courses
;
;<img height="330" src="img/RDAI-courses-QRcode_172x165.png" alt="RDAI Courses QR code"/>

---

## Machine Learning SG MeetUp Group
* Next Meeting = ?-Jan-2026 @ Google
* Topic(s) : TBA
* Typical Contents : 
 + Talk for people starting out
 + Something from the bleeding-edge
 + Lightning Talks
* [MeetUp.com / Machine-Learning-Singapore](https://www.meetup.com/Machine-Learning-Singapore/)

## Quick Poll
#### Show of hands

* How did you hear about THIS event?
  + MeetUp email
  + luma.com email
  + Messaging group
  + MLSG friends directly
  + Work colleagues

## Quick Poll
#### Show of hands

* How do you feel about MeetUp vs Luma?
  + luma is better
  + MeetUp is better
  + Don't really care

;--
;
;## Quick Poll
;#### Show of hands
;
;* What topic(s) would _compel_ you to come?
;  + Stable-diffusion++ / Video / Gaussian Splatting
;  + Robotics
;  + Reinforcement Learning
;  + AI for Education
;  + LLMs for Science
;  + Agents

---

# See You Next Time !

Please add yourself to the MLSG Calendar on Luma!

;`Handouts :` [`https://bit.ly/` `text-similarity-jan-2022`](https://bit.ly/text-similarity-jan-2022)