## Gemini 3 ! #### Machine Learning Singapore
[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
[Sam Witteveen](http://samwitteveen.com) @ [reddragon.ai](http://reddragon.ai/)
26-November-2025
--
--- ## Today's Line-up * "SAM 3/3D; Evolving Reasoning"
- _Martin Andrews_ * "Memento: Addressing the Most Undervalued
Area of Agent Research?"
- _Nicholas Chen_ * "New Gemini, New Banana & New Agents"
- _Sam Witteveen_ --- ;## SAM 3/3D & Evolving Reasoning ## Evolving Reasoning
& SAM 3/3D #### Machine Learning Singapore
[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
; $x^2=17_i$
$
; ##
DSPy
26-November-2025
--- ## About Me * Machine Intelligence / Startups / Finance + Moved from NYC to Singapore in Sep-2013 * 2014 = 'fun' : + Machine Learning, Deep Learning, NLP + Robots, drones * Since 2015 = 'serious' :: NLP + deep learning + Including Papers... + & GDE ML; ML-Singapore co-organiser... + & Red Dragon AI... -- ## About Red Dragon AI * Deep Learning Consulting & Prototyping (Google Partner) - Education / Training - Research : NeurIPS / ICML / ICLR / NAACL / EMNLP * Please contact us for : - Language model training (eg: on-prem) - Knowledgebase interaction & reasoning - Sales-oriented applications --- ## Outline * Evolving Reasoning [Level: 300] + Current issues with RL + Some 'normal' fixes + Evolutionary approaches * SAM 3 [Level: 100] + Segmentation + 3D-Objects + 3D-Body * Wrap-up & QR-code ;* Introspection ;* Head's Up! --- ### Reinforcement Learning * Reasoning with Verifiable Rewards + some issues... + effectiveness of LoRA * On-Policy Distillation * Evolutionary approaches ;* Sampling --- ### Aha! Moment * 20-Jan-2025 : ["DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning"](https://arxiv.org/abs/2501.12948) + 22-Jan-2025 : [MLSG MeetUp talk](https://mdda.net/blog/research/talks/NeurIPS-recap-and-SOTA) * Suddenly Open Source could: + see how 'easy' OpenAI's `o1` training could be + implement reasoning pipelines (v. exciting) -- ## R1-Zero : Prompt #### *Slides from Jan-2025* ; [1-2|3|4-5|6-7] ;The user asks a question, and the Assistant solves it. ;The assistant first thinks about the reasoning process in the mind ;and then provides the user with the answer. ```html A conversation between User and Assistant. ... The reasoning process and answer are enclosed within
and
tags, respectively, i.e.,
reasoning process here
answer here
* User: {prompt}. * Assistant: ``` * Very plain-spoken. No examples. No 'help' * Reward signal = Is the answer correct : Y/N * RL process : GRPO (i.e. ~DPO for groups) rather than PPO ; + Simple due to compute constraints? ;The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, ; - i.e., <think> reasoning process here </think> <answer> answer here </answer>. -- ## R1-Zero : Learning
* Surprise : Lift-off achieved without any extras -- ## R1-Zero : Aha!
* Process self-referral : "... aha moment **I** can flag" + BUT : might switch language; or loop --- ### What have we learned
since January? * Training "On-Policy" is very different + Pre-training $\ne$ Roll-outs * RLVR (RL Verified Rewards) works reasonable well + unintuitive: roll-out cost $\gg$ GRPO backprop * Niggles (with some fixes...): + RL doesn't teach new stuff - choice of (small) model matters + $\mathcal{O}(1)$ bit of data per roll-out + Not clear what good tasks are - so that 'reasoning' generalises ; + Learning strategies is key point --- ## Do models discover reasoning through RL? * [Reasoning with Sampling: Your Base Model is Smarter Than You Think](https://arxiv.org/abs/2510.14901) - Karan & Du (2025) + Base models already know about reasoning - Just need to sample a lot * this is a smarter way to sample * i.e. just find higher overall probablity roll-outs * [Project Page](https://aakaran.github.io/reasoning_with_sampling/) & [Code Repo](https://github.com/aakaran/reasoning-with-sampling) + Training-free + Sharpened distributions - simple iterative sampling algorithm using base models' own likelihoods ; * [Author Thread](https://x.com/aakaran31/status/1979194052697280712) ; + Has animations, etc ; + "interesting how you got away with N_{MCMC} = 10 using only 8x more compute. mixing time should be ~logarithmic in # sequences x (so around your block size T/B)" ; + "Low-temp/variants with beam search on the base model get some boost over the base model but not as much as GRPO/power sampling. " -- ### Reasoning with Sampling ;#### Results
* In-domain (MATH500): + power sampling close to GRPO - without ever changing the base model's weights * Out-of-domain (HumanEval and AlpacaEval 2.0) + power sampling actually outperforms GRPO --- ## Do models learn reasoning through RL? * [Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) - Yue _et al_ (2025) + [Project Page](https://limit-of-rlvr.github.io/) + Key points: - Base models surpass RL at large pass@k - RL narrows exploration + This is a silent performance-killer - Current RL algorithms plateau + TL;DR : Not really -- ## Limits of RLVR
* RL might make model more efficient - at the expense of reducing search --- ## Low information
per roll-out * [LoRA Without Regret](https://thinkingmachines.ai/blog/lora/) blog post + John Schulman _et al_ @ Thinking Machines + LoRA is actually a good match for RLVR - (particularly Policy-Gradient methods) - even $R=1$ can be effective! * idea : reasoning is about strategy rather than knowledge --- ## On-Policy Distillation #### Learning more per roll-out * [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) blog post + Kevin Lu _et al_ @ Thinking Machines + includes pseudo-code * Basic ideas: + small models learn badly off-policy - but that's what Supervised Fine-tuning is! + Policy-Gradient methods only use $\mathcal{O}(1)$ bit of data per roll-out - so they're inefficient (although on-policy for student) + get teacher logits for student rollout - much denser information, and on-policy for student -- ## Off-policy vs On-policy * Regular Fine-tuning / Distillation:
* On-Policy Distillation:
-- ## On-Policy Distillation #### Take-aways * Much more efficient + can do small-model roll-outs + get teacher feedback 'in parallel mode' * Denser feedback about student roll-outs
--- ## Evolution Strategies (2017) * [Deep neuroevolution: Genetic Algorithms are a competitive alternative for training DNNs for RL](https://arxiv.org/abs/1712.06567) - Such _et al_ (2017) + authors included: Joel Lehman, Kenneth Stanley, Jeff Clune (@Uber) + Key ideas: * cannot do evolution on individual weights * single random seed $\rightarrow$ full matrix - low communication cost * can pick+choose which seeds using evolution + Results (on Atari): * competitive agents evolve in tens of generations *(population size ~1k individuals --- ## Evolving Reasoning (v1) * [Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning](https://arxiv.org/abs/2509.24372) - Qiu _et al_ (2025) + [Official Repo](https://github.com/VsonicV/es-fine-tuning-paper) (Commercial license required) + Full-matrix Evolution Strategies (using random seeds) + Idea: Optimising in parameter space, not roll-out space ; + [Author Thread](https://x.com/yule_gan/status/1975177775251087436)
-- ## Evolving Reasoning (v2) ; low-rank matrix perturbation * [Evolution Strategies at the Hyperscale](https://arxiv.org/abs/2511.16652) - Sarkar _et al_ (2025) + "EGGROLL" - **E**volution **G**uided **G**eneral **O**ptimization via **L**ow-rank **L**earning - basically add many small LoRAs to the full matrices * as good as full ES (but much faster - can compute on-the-fly) - competitive with GRPO + [Project Page](https://eshyperscale.github.io/) ; * [Author Thread](https://x.com/bidiptas13/status/1992706291547127860) ; + Author [website](https://bsarkar321.github.io/) ; + Author created a [JAX RWKV](https://github.com/bsarkar321/jaxrwkv) ; * [bycloud = support Thread](https://x.com/bycloudai/status/1992927982818836947)
-- ## EGGROLL `int8` training * Scaled 8-bit LLM training: + EGGROLL can work on integer weights - [Code Repo](https://github.com/ESHyperscale/nano-egg) (GPL-3) + apply Evolutionary Strategies to [RWKV-7](https://www.rwkv.com/) - two reasoning tasks : Countdown & GSM8k
--- ## Reasoning Wrap-up * GRPO / PPO etc are *seductive* but may be wrong track + Perhaps we haven't yet explored RL fully * "Evolution Strategies" ... + ... just scratching surface of Evolution ideas * Interesting Kaggle competition : + [Gemma.small general reasoner](https://www.kaggle.com/competitions/google-tunix-hackathon) using Tunix / TPUs --- ## SAM-3 #### All from Meta * SAM-3 itself + DEMO * SAM-3 extensions: + SAM-3D + SAM-3D-Bodies + DEMO --- ## SAM-3 itself * ["SAM 3: Segment Anything with Concepts"](https://ai.meta.com/research/publications/sam-3-segment-anything-with-concepts/)
- Carion _et al_ (2025) + [SAM 3 Project Page](https://ai.meta.com/sam3/) & [Blog Post](https://ai.meta.com/blog/segment-anything-model-3/) + Builds on ["Perception Encoder: The best visual embeddings are not at the output of the network"](https://arxiv.org/abs/2504.13181) - Bolya _et al_ (2025) * Builds on ["SAM 2: Segment Anything in Images and Videos"](https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/) - Ravi _et al_ (2024) - [SAM 2 Project Page](https://ai.meta.com/sam2/) + [Code Repo](https://github.com/facebookresearch/sam3) ("open" license) - Models need HF approval - [Jupyter widget Colab](https://github.com/facebookresearch/sam3/blob/main/examples/sam3_image_interactive.ipynb) -- ## SAM-3 evolution
-- ## SAM-3 architecture
-- ## SAM-3 demo
* https://aidemos.meta.com/segment-anything/gallery + [Image Segmentation](https://aidemos.meta.com/segment-anything/editor/segment-image) - try [Luggage layout image](https://aidemos.meta.com/segment-anything/editor/segment-image/?media_id=1747347625966823) + [Video Segmentation](https://aidemos.meta.com/segment-anything/editor/segment-video) - try [traffic video](https://aidemos.meta.com/segment-anything/editor/segment-video/?media_id=3846378595584619) --- ## SAM-3D Objects * ["SAM 3D: 3Dfy Anything in Images"](https://ai.meta.com/research/publications/sam-3d-3dfy-anything-in-images/) - + SAM 3D [Project Page](https://ai.meta.com/sam3d/) & [Blog Post](https://ai.meta.com/blog/sam-3d/) - [Repo](https://github.com/facebookresearch/sam-3d-objects) ("open" license) * Has Colabs for 3D object export to Gaussian Splat ; - [Architecture png](https://github.com/facebookresearch/sam-3d-objects/blob/main/doc/arch.png)
-- ## SAM-3D Object Demo
* https://aidemos.meta.com/segment-anything/gallery + [Create 3D scenes](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) : Notice 64x64x64 voxels, followed by refinement - try [Void Deck](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) "Upload" - try [Merlion](https://aidemos.meta.com/segment-anything/editor/convert-image-to-3d) "Upload" --- ## SAM-3D Bodies * ["SAM 3D Body: Robust Full-Body Human Mesh Recovery"](https://ai.meta.com/research/publications/sam-3d-body-robust-full-body-human-mesh-recovery/)
- Yang _et al_ (2025) + Builds on: * [ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling](https://arxiv.org/abs/2508.15767) - Park _et al_ (2025) * Meta [MHR: Momentum Human Rig](https://arxiv.org/abs/2511.15586) - Ferguson _et al_ (2025) - "Open" models released (requires HF approval) + [Repo](https://github.com/facebookresearch/sam-3d-body) ("open" license)
-- ## ATLAS #### Pose Structure
* ATLAS: skin is offset vs Skeleton + better foundation than SMPL-X -- ### MHR: Momentum Human Rig ;#### Identity Space
* Includes multiple Levels of Detail (LOD) - 73639, 18439, 10661, 4899, 2461, 971 and 595 vertices * Far better license than SMPL / SMPL-X -- ## SAM-3D Bodies #### Demo
* https://aidemos.meta.com/segment-anything/gallery + [Create 3D bodies](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d) - try [Merlion](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d) "Upload" - try [Lau Pa Sat](https://aidemos.meta.com/segment-anything/editor/convert-body-to-3d) "Upload" --- ## Wrap-Up * DIY Reasoning kicked off in January + Actually possible to do this at smaller scale! + Big labs are searching for ideas * SAM 3 suite are incredible... + Not much else to say
NB: MLSG wants to feature Your Talk!
(Say "Hello"...)
-- ## Link to Slides [
](https://bit.ly/MLSG_2025-11) [https://bit.ly/MLSG_2025-11](https://bit.ly/MLSG_2025-11) --- ### Memento: Addressing the Most Undervalued Area of Agent Research? #### Nicholas Chen * Value of memory * How to learn to retrieve - looking at the code --- ### New Gemini, New Banana
& New Agents #### Sam Witteveen * Gemini 3 * Nano Banana Pro * Agents: + Anti-gravity (fka Windsurf) + ... --- ## THANK YOU! * Venue: + Google * MLSG Voluteers: + Jen; JF; Shern; Nicholas; Geoffrey; Anthony; Leonard; Malik; -- ## REFRAG: Rethinking RAG based Decoding #### Xiaoqiang Lin **UPDATE** * Advanced pre-training / training + RAG-like decoding FTW! + **appeared on Connor Shorten's [Weaviate Podcast](https://x.com/CShorten30/status/1985361515889803741)** ; @CShorten30 ;I am SUPER EXCITED to publish the 130th episode of the featuring ;* [Xiaoqiang Lin ( @xiaoqiang_98 ), the lead author of REFRAG from Meta Superintelligence Labs!]() -- ### [Ilya Sutskever on
Dwarkesh Podcast](https://www.youtube.com/watch?v=aR20FWCCjAs) [
](https://x.com/KimNoel399/status/1993470777086427272) --- ## Further Study * Field is growing very rapidly * Lots of different things can be done * Easy to find novel methods / applications -- ## Deep Learning Foundations * 3 week-days + online content * Play with real models & Pick-a-Project * Held online, Live Coding, Certificates * Next run : TBA -- ## NLP (Advanced) ### Advanced NLP and Sequence Processing * NLP (eg: Named Entity Recognition) * Transformers : Theory and Practice * Generative AI * Next run : TBA -- ## Vision (Advanced) ### Advanced Computer Vision with Deep Learning * Advanced classification * Other architectures (eg: U-Nets) * Transformer-based vision * Next run : TBA -- ## Deep Learning for PMs ### ( `= Foundations - code`
`+ management` ) * Much more about 'big picture' * Only a few code examples * Project process standardised * Next run : TBA -- ## AI in Production ### Building Real World A.I. Applications * DIY : node-server + task-queue + python-ml * TensorFlow Serving / PyTorch Serve * TF Lite + TF.js : edge device models * Distillation, pruning, quantisation, etc... * Next run : TBA -- ## Also... * Unsupervised methods * Time-series & Deep Learning * Audio Processing (Sounds & Speech) ;-- ; ;## QR code for Courses ; ;
--- ## Machine Learning SG
MeetUp Group * Next Meeting = ?-Jan-2026 @ Google * Topic(s) : TBA * Typical Contents : + Talk for people starting out + Something from the bleeding-edge + Lightning Talks * [MeetUp.com / Machine-Learning-Singapore](https://www.meetup.com/Machine-Learning-Singapore/) -- ## Quick Poll #### Show of hands * How did you hear about THIS event? + MeetUp email + luma.com email + Messaging group + MLSG friends directly + Work colleagues -- ## Quick Poll #### Show of hands * How do you feel about MeetUp vs Luma? + luma is better + MeetUp is better + Don't really care ;-- ; ;## Quick Poll ;#### Show of hands ; ;* What topic(s) would _compel_ you to come? ; + Stable-diffusion++ / Video / Gaussian Splatting ; + Robotics ; + Reinforcement Learning ; + AI for Education ; + LLMs for Science ; + Agents --- # See You
Next Time !
Please add yourself to the
MLSG Calendar on Luma! ;`Handouts :` [`https://bit.ly/`
`text-similarity-jan-2022`](https://bit.ly/text-similarity-jan-2022)