# News from the Frontier #### Machine Learning Singapore
[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
[Sam Witteveen](http://samwitteveen.com) @ [reddragon.ai](http://reddragon.ai/)
25-March-2025
--- ## Today's Line-up * "Latent Space Reasoning"
- _Martin Andrews_ * "Open-endedness and Interestingness"
- _Jenny Zhang_ * "My ICLR Highlights"
- _Raymond Chan_ * "Agent Announcements & Trends
from Google Cloud Next 2025"
- _Sam Witteveen_ --- # Reasoning in Latent Space #### Machine Learning Singapore
[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
; $x^2=17_i$
25-March-2025
--- ## About Me * Machine Intelligence / Startups / Finance + Moved from NYC to Singapore in Sep-2013 * 2014 = 'fun' : + Machine Learning, Deep Learning, NLP + Robots, drones * Since 2015 = 'serious' :: NLP + deep learning + Including Papers... + & GDE ML; ML-Singapore co-organiser... + & Red Dragon AI... -- ## About Red Dragon AI * Deep Learning Consulting & Prototyping (Google Partner) - Education / Training - Research : NeurIPS / EMNLP / NAACL / ICML / ICLR * Please contact us for : - Language model training (eg: on-prem) - Knowledgebase interaction & reasoning - Sales-oriented applications --- ## Outline * Latent Space Reasoning + What's involved? + Three different techniques * ICLR + Three paper takeaways * Head's Up! * Wrap-up & QR-code --- ## GPT Models ;#### Training and Inference
G
enerative
P
re-trained
T
ransformer = "Autoregressive" (OpenAI, 2018)
-- ## Tokens in-and-out * Tokens are the inputs + From a dictionary of ~262k - which is \$2^{18}\$ - i.e. 18 bits of information + But 'hidden dim' is [3584d](https://github.com/google/gemma_pytorch/blob/main/gemma/config.py) - so there's "Tons of Space" * and the last layer converts... + 3584d → 1-of-262k (SoftMax) + But *surely* there is rich information inside... ; * Why are we throwing away information? -- ## The Latent Space * We only train these models on tokens + but they seem to have a *sense* of the topic + ... over a *longer timescale* that 1 token * Can we 'unpack' this knowledge better + or use it more effectively? --- ## Latent Space Reasoning * Look at three main approaches: + Recurrent Depth + Latent Tokens + Large Concept Models --- ## Recurrent Depth * Has existed for a long time... + [Universal Transformers](https://arxiv.org/abs/1807.03819) - Dehghani _et al_ (2019) - Focus : Turing complete T5 - Downside : Didn't seem to catch on + [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) - Lan _et al_ (2019) - Focus : Save parameters - Downside : No time saving -- ## Recurrent Reasoning
* [Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) - Geiping _et al_ (2025) + "Huginn" = More modern take on which layers to repeat -- ## Recurrent Reasoning
-- ## Recurrent Reasoning #### Summary * Authors did three runs, last of which worked + Big guesses at 'fixes' between iterations * [Code Repo](https://github.com/seal-rg/recurrent-pretraining) (Apache 2) and [Open Weights](https://huggingface.co/tomg-group-umd/huginn-0125) * Huginn demonstrates that the idea ~works + OTOH : ["...but don't get too excited cuz we don't beat OLMo2"](https://x.com/tomgoldsteincs/status/1888980680790393085) --- ## Latent Tokens * [COCONUT : Training Large Language Models to Reason in a Continuous Latent Space](https://arxiv.org/abs/2412.06769) - Hao _et al_ (2024) + Facebook [Code on GitHub](https://github.com/facebookresearch/coconut) (MIT) + [GDE Blog Post](https://gonzoml.substack.com/p/chain-of-continuous-thought-coconut) * Instead of decoding the last hidden state into a token: - feed state directly as input to the decoder - ... as an embedding for the next step - ... in the autoregressive generation process -- ## COCONUT #### Latent Tokens
-- ## COCONUT
* Train by 'unlocking' tokens and training on 'true latents' -- ## Latent Tokens #### Summary * Results give so-so accuracy + while claiming fewer tokens * But there's more scope for exploring: + Latent token tree-search + Planning, etc * Later work claims better results: + [CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation](https://arxiv.org/abs/2502.21074) - Shen _et al_ (2025) --- ## Large Concept Models * [Large Concept Models: Language Modeling in a Sentence Representation Space](https://arxiv.org/abs/2412.08821) - Barrault _et al_ (2024) + Facebook [Blog Post](https://ai.meta.com/blog/meta-fair-updates-agents-robustness-safety-architecture/) & [Repo on GitHub](https://github.com/facebookresearch/large_concept_model) (MIT) with Training code - 7B model trained (and released) * Relies on [SONAR](https://github.com/facebookresearch/SONAR) for 'thoughts' + SONAR encoders and decoders: - ~200 languages and multiple modalities - Reconstruct text from SONAR embeddings -- ## Large Concept Model
;* "Sentences" (text or audio) -- ## Large Concept Model #### Summary * 49-page paper explores several transition models: + Regular Transformer decoder + Diffusion-based (!) * and also how to 'quantise' the Concept vectors + Is 'regression' or 'classification' best? * Key issue: + Are the SONAR embeddings good for chaining thoughts? + ... an open question --- ## Latent Wrap-up * Is Latent Space Reasoning "a Thing"? * It requires extra engineering: + Is this in opposition to the Bitter Lesson? - [Hyung Won Chung YouTube video](https://www.youtube.com/watch?v=3gb-ZkVRemQ&t=1854s) + Or are we victims of history? - Sara Hooker's ["The Hardware Lottery"](https://arxiv.org/abs/2009.06489) - Hooker (2020) * Remains to be seen... --- ## ICLR take-aways * Conference = 3 days, >10k attendees + 6 Poster sessions of 2.5 hours + Talked to 30+ presenters per session + Walked *miles* * Aside: + Lots of Reasoning papers were OLD NEWS - Presenters had 'follow up' work to talk about * Then : 2 days of workshops + These are always more up-to-date + Includes focussed poster sessions ; https://x.com/iruletheworldmo/status/1915338995707359274 ; turns out the rl victory lap was premature. ; new tsinghua paper quietly shows the fancy reward loops just squeeze ; the same tired reasoning paths the base model already knew. ; pass@1 goes up, sure, but the model's world actually shrinks. ; feels like teaching a kid to ace flash cards and calling it wisdom. ; [Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?](https://arxiv.org/abs/2504.13837) - Yue _et al_ (2025) -- ## Workshop Poster!
* [Generating Code to Verify Cryptic Crossword Reasoning](https://openreview.net/forum?id=2nC7zy7adD) + Our Poster at DL4Code! & [SlidesLive video, etc](https://iclr.cc/virtual/2025/34846) --- ## ICLR Fun Papers * Too many papers to choose between! + Only talked to ~30 presenters per session (of 600+) ; + (also putting Openendedness/Interestingness to one side) * Selection emphasising novelty/variety: + Apple random compression + Memory Mosaics + Faces --- ## SeedLM * [SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators](https://arxiv.org/abs/2410.10714) - Shafipour _et al_ (2025) + Apple Research : [ICLR link](https://iclr.cc/virtual/2025/poster/28000) * Motivation : LLMs are slow due to Bandwidth + Compression can help on the edge * Key idea: Generate the weights on-the-fly + Find a seed for RAND that can reproduce matrices (!) -- ## SeedLM
* Reduces 8 params → 3 params + 16-bit RAND seed -- ## SeedLM * Does it work? : Apparently YES + Apple implemented it on FPGA hardware... * Competitive accuracy with AWQ + ... but <<bandwidth --- ## Memory Mosaics * [Memory Mosaics](https://arxiv.org/abs/2405.06394) - Zhang _et al_ (2025) + [ICLR Link](https://iclr.cc/virtual/2025/poster/30157) + FAIR @ Meta (in NYC) : Léon Bottou + "Return of the Associative Memory" - Has been around for DECADES - Have actual *theoretical* results ! - Can scale _á la_ Transformer -- ## Memory Mosaics
* Each 'mem' is a KV-lookup Associative Memory * [BabiStories dataset and code](https://github.com/facebookresearch/MemoryMosaics) (Apache 2) --- ## Faces * Lots of papers about Faces/Portraits/Avatars + Mainly China-based companies - Bytedance, Alibaba, iFlytek, etc * Different techniques + Diffusion models (+ distillation)? + Gaussian splatting + Audio models / motion planning + Mesh vs image-based constraints * Safety concerns? Not so much... -- ## Loopy Architecture
* [Loopy: Taming Audio-Driven Portrait Avatar
with Long-Term Motion Dependency](https://arxiv.org/abs/2409.02634) - Jiang _et al_ (2025) + [Project Page](https://loopyavatar.github.io/) & [ICLR link](https://iclr.cc/virtual/2025/poster/32049) --- ## Head's Up! * Some quick things... + AMD GPU kernel contest +
Llama 4 models
+ Qwen3 models launched + Shopping in ChatGPT -- ## AMD GPU kernels [
](https://www.datamonsters.com/amd-developer-challenge-2025) * Registration Deadline = midnight tonight (PST) + Competition deadlines ~ end May ; https://x.com/pavel_4_ai/status/1915039361655083223 ; AMD software is improving rapidly ; Cuda isn't a moat forever, but Nvidia is building new ones with the Python DSL, Dynamo, and more ; Meanwhile Nvidia hardware advantage is huge this year, but perf/TCO of 355X has attracted some customers ; MI450X is actually competitive with Rubin -- ## Qwen3 * Alibaba Cloud [blog post for release](https://qwenlm.github.io/blog/qwen3/) * Early vibes: + Excellent benchmarks + Nice sizes - e.g. Qwen3-30B-A3B MoE: - 128 experts / 8 active; 128K context + (Small Size too : Qwen3-0.6B) + Good 'thinkers' + ... but don't know many facts -- ## Shopping in ChatGPT [
](https://x.com/gdb/status/1917009041035038837) --- ## Wrap-Up * Reasoning in Latent Space is interesting idea + No clear direction (yet) * ICLR was a lot of Fun + and visitors clearly had a good impression of Singapore * Looking forward to AAAI in Jan-2026 (also in SG)!
NB: MLSG wants to feature Your Talk!
-- ## Link to Slides [
](https://bit.ly/MLSG_2025-04) [https://bit.ly/MLSG_2025-04](https://bit.ly/MLSG_2025-04) --- ## Open-endedness and Interestingness #### Jenny Zhang * Personal experience at ICLR * Highlights include: + keynotes + discussions and + others interesting things... ; https://x.com/jennyzhangzt/status/1917103091691958304 --- ## My ICLR Highlights #### Raymond Chan * Highlights from the ICLR speaker presentations : + Language Model Alignment in Multilingual Trolley Problems + Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View + Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images --- ## Agent Announcements & Trends from Google Cloud Next 2025 #### Sam Witteveen * Google's Cloud Next '25 key announcements: + Agent Development Kit (ADK); + Agent Space; and + Agent2Agent protocol * Trends for startups and companies demoing agentic products --- ## Further Study * Field is growing very rapidly * Lots of different things can be done * Easy to find novel methods / applications -- ## Deep Learning Foundations * 3 week-days + online content * Play with real models & Pick-a-Project * Held online, Live Coding, Certificates * Last run : Early September -- ## NLP (Advanced) ### Advanced NLP and Sequence Processing * NLP (eg: Named Entity Recognition) * Transformers : Theory and Practice * Generative AI * Last run : Early October -- ## Vision (Advanced) ### Advanced Computer Vision with Deep Learning * Advanced classification * Other architectures (eg: U-Nets) * Transformer-based vision * Last run : Early November -- ## AI in Production ### Building Real World A.I. Applications * DIY : node-server + task-queue + python-ml * TensorFlow Serving / PyTorch Serve * TF Lite + TF.js : edge device models * Distillation, pruning, quantisation, etc... * Last run : Early February -- ## Deep Learning for PMs ### ( `= Foundations - code`
`+ management` ) * Much more about 'big picture' * Only a few code examples * Project process standardised * Last run : Late January -- ## Also... * Unsupervised methods * Time-series & Deep Learning * Audio Processing (Sounds & Speech) ;-- ; ;## QR code for Courses ; ;
--- ## Machine Learning SG
MeetUp Group * Next Meeting = 22-May-2025 * Topic : TBA * Typical Contents : + Talk for people starting out + Something from the bleeding-edge + Lightning Talks * [MeetUp.com / Machine-Learning-Singapore](https://www.meetup.com/Machine-Learning-Singapore/) -- ## Advanced Build With AI * 17-May-2025 (Saturday) + Pizza + Afternoon session + Topics = Googley things - Gemini from A-Z - Agents - Vibe coding + Hands-on :: Laptops required! * Sign-up page TBA ;-- ; ;## Quick Poll ;#### Show of hands ; ;* What topic(s) would _compel_ you to come? ; + Agents ; + LLMs for Science ; + Stable-diffusion++ / Video / Gaussian Splatting ; + [Vibe Coding](https://x.com/MatthewBerman/status/1904039128611914144) ; + LLMs with Retrieval (RAG) ; + Robotics --- # - Questions -
;`Handouts :` [`https://bit.ly/`
`text-similarity-jan-2022`](https://bit.ly/text-similarity-jan-2022)