Machine Learning Singapore : 19-June-2025 : Self-Improving Agents

# Self-Improving Agents
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)
 
;[Sam Witteveen](http://samwitteveen.com) @ [reddragon.ai](http://reddragon.ai/)

19-June-2025

---

## Today's Line-up

* "RAG, Agents, RL" - _Vivek Kalyan_
* "Rethinking Superapps: Voice-Driven Multi-Agent Systems with Gradio MCP" - _Leonard Loo_
* "Self-Improving Agents" - _Martin Andrews_

---

## RAG, Agents, RL
#### Vivek Kalyan

* Issues with RAG 
* Strong baselines
* Training with Reinforcement Learning

---

## Rethinking Superapps: Voice-Driven Multi-Agent Systems with Gradio MCP
#### Leonard Loo

* Superapps
* Building MCP servers
* Integrating LLMs with MCP and Voice

---

# Self-Improving Agents
#### Machine Learning Singapore

[Martin Andrews](http://mdda.net) @ [reddragon.ai](http://reddragon.ai/)

; $x^2=17_i$

19-June-2025

---

## About Me

* Machine Intelligence / Startups / Finance  
  + Moved from NYC to Singapore in Sep-2013

* 2014 = 'fun' :
  + Machine Learning, Deep Learning, NLP
  + Robots, drones

* Since 2015 = 'serious' :: NLP + deep learning
  + Including Papers...
  + & GDE ML; ML-Singapore co-organiser...
  + & Red Dragon AI...

## About Red Dragon AI

* Deep Learning Consulting & Prototyping (Google Partner)
  - Education / Training
  - Research : NeurIPS / EMNLP / NAACL / ICML / ICLR

* Please contact us for : 
  - Language model training (eg: on-prem)
  - Knowledgebase interaction & reasoning
  - Sales-oriented applications

---

## Outline

* DSPy
* Darwin Gödel Machines
* GPU Kernel Scientist
* Wrap-up & QR-code

;* Head's Up!

---

## DSPy
#### As Agentic Framework

* DSPy is an LLM Framework
  + Can call multiple backends
  + Orchestrate 'flows', etc
  + ... but has a v. different 'feel'
* DSPy @ MLSG in [March-2024](https://mdda.net/blog/research/talks/DSPy-gemini-and-gemma)

---

## DSPy Signature

```py
import dspy
dspy.settings.configure(lm=dspy.LM("gemini/gemini-2.5-flash"))

class SentimentClassifier(dspy.Signature):
  """ Classify the sentiment of a text. """

text: str = dspy.InputField(
                desc="input text to classify sentiment")
  sentiment: int = dspy.OutputField(
    desc="sentiment, the higher the more positive", 
    ge=0, le=10
  )
```

## DSPy Predefined Module

```py
predict = dspy.Predict(SentimentClassifier)
# or: predict = dspy.ChainOfThought(SentimentClassifier)

output = predict(text="I am feeling pretty happy!")

print(output)
# Prediction(
#     sentiment=8
# )

```

## Behind-the-scenes  1/3

```log
System message:

Your input fields are:
1. `text` (str): input text to classify sentiment

Your output fields are:
1. `sentiment` (int): sentiment, the higher the more positive
  Constraints: 
    greater than or equal to: 0, 
    less than or equal to: 10
```

## Behind-the-scenes 2/3

```log
All interactions will be structured in the following way, 
with the appropriate values filled in.

[[ ## text ## ]]
{text}

[[ ## sentiment ## ]]
{sentiment} # note: the value must be a single int value

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Classify the sentiment of a text.
```

## Behind-the-scenes  3/3

```log
User message:

[[ ## text ## ]]
I am feeling pretty happy!

Respond with the corresponding output fields, 
starting with the field `[[ ## sentiment ## ]]` 
and then ending with `[[ ## completed ## ]]`.

Response:

[[ ## sentiment ## ]]
8

[[ ## completed ## ]]
```

---

## DSPy RAG

```py
class QueryGenerator(dspy.Signature):
  """ Generate a query based on question 
      to fetch relevant context """
  question: str = dspy.InputField()
  query: str = dspy.OutputField()

def search_wikipedia(query: str) -> list[str]:
  """ Query ColBERT endpoint, which is a 
      knowledge source based on wikipedia data """
  results = dspy.ColBERTv2(
    url='http://server:port/wiki17_abstracts')(query, k=1)
  return [x["text"] for x in results]
```

* `QueryGenerator` is a Signature
* `search_wikipedia` is a Python tool

## DSPy Custom Module

```py
class RAG(dspy.Module):
  def __init__(self):
    self.query_generator = dspy.Predict(QueryGenerator)
    self.answer_generator = dspy.ChainOfThought(
      "question,context->answer")
  def forward(self, question, **kwargs):
    query = self.query_generator(question=question).query
    context = search_wikipedia(query)[0]
    return self.answer_generator(
      question=question, context=context).answer

rag = RAG()
```

* Looks very much like PyTorch...

## DSPy Optimisation

```py
optimiser = dspy.MIPROv2(
  metric=dspy.evaluate.answer_exact_match,
  auto="light",
  num_threads=16
)

optimised_rag = optimiser.compile(
  rag,
  trainset=trainset,
  valset=valset,
  requires_permission_to_run=False,
)
```

* `optimised_rag` is a new version of `rag` ...
  + ... with its prompts **optimised** !
  + (based on trainset/valset)

---

## DSPy Wrap-up

* DSPy has maintained focus on first principles...
  + Still elegant, and extensible
* NEW: [Databricks' 49mins Course](https://www.deeplearning.ai/short-courses/dspy-build-optimize-agentic-apps/)
  + Signatures and Modules
  + MLflow Tracing
  + Optimizing Agents with DSPy Optimizer

---

## Darwin Gödel Machines

* [DGM: Open-Ended Evolution of Self-Improving Agents](https://arxiv.org/abs/2505.22954) - Zhang _et al._ (2025)
 + Key authors (@UBC.CA) include:
 - Jenny Zhang (first author, spoke at MLSG in April-2025)
 * [Repo on GitHub](https://github.com/jennyzzt/dgm) (Apache 2)
 - Jeff Clune (see also: MAP-Elites, etc)
 + Sakana.ai blog : [DGM: AI that improves itself by rewriting its own code](https://sakana.ai/dgm/)

;* Gödel ~ Mathematician
;  + Famous for Incompleteness Theorem
;* Darwin ~

## DGM : Key Ideas

## Gödel ~ Mathematician

* Gödel machines : Self-referential universal problem solvers making provably optimal self-improvements
  + Inspiration for [Schmidhuber's 2003 work](https://people.idsia.ch/~juergen/goedelmachine.html) ...
  + (Gödel = famous for [Incompleteness Theorem](https://www.quantamagazine.org/how-godels-proof-works-20200714/))

## Darwin ~ evolution...

---

## Evolutionary Algorithms
#### A bit of history

* Back in the mid-1990s:
  + Neural Networks only 'kinda' worked 
    - whereas HMMs and SVMs were on the horizon
  + But Genetic Algorithms / Programming actually worked
    - So: My PhD was in a NN lab, but I did GP
* Extensively covered in MLSG last month...

## Evolution Innovations

* [Novelty Search](https://www.semanticscholar.org/paper/NOVELTY-SEARCH-AND-THE-PROBLEM-WITH-OBJECTIVES-TO-Lehman-Stanley/e49d1ee1bddea0922faca358f3fd42474baad300?p2df) - Lehman & Stanley (2011)
  + "Why Greatness Cannot be Planned"
* [MAP-Elites](https://arxiv.org/abs/1504.04909) - Mouret & Clune (2015)
  + Also : Work by *MLSG speaker* Jenny Zhang
* Help to solve "Population Collapse"

## Evolution with LLMs

;* Can use an LLM as the Mutation/Crossover operator
;  + ... and operate on text / prompts / code 
* Evolving Prompts:
  + [Promptbreeder](https://arxiv.org/abs/2309.16797) - Fernando _et al._ (2023)
    - "Self-Referential Self-Improvement via Prompt Evolution"
  + [Self-Discover](https://arxiv.org/abs/2402.03620) - Zhou _et al._ (2024)
    - "Large Language Models Self-Compose Reasoning Structures"
* Evolving Programs:
  + [FunSearch](https://www.nature.com/articles/s41586-023-06924-6.pdf) - Romera-Paredes _et al._ (2024)
    - "Mathematical discoveries from program search with large language models"
  + [AlphaEvolve](https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/) - Novikov _et al._ (2025)
    - "A Gemini-powered coding agent for designing advanced algorithms"

---

## Code & Agent re-writes

* Key task:  `SelfImprovement(Parent)🠚Child`
  + Measure effectiveness of improvement on programming tasks

## Family Tree

## Performance Steps

* _"Throughout this paper, the term SWE-bench refers by default to the SWE-bench Verified subset."_

---

## Darwin Gödel Machines
#### Wrap-up

* Self-Improvement is a meta-level up from AlphaEvolve
* Code repo contains all the prompts
  + (no agent framework used)
* Also: [Andrej Karpathy's AI Startup School talk](https://www.youtube.com/watch?v=LCEmiRjPEtQ)  
  + for Software 1.0, 2.0, 3.0 (+ ?)

---

## "GPU Kernel Scientist"

* Needs a bit of background:
  + Much more detail at May MeetUp...
* Ideas:
  + GPU Kernels & importance
  + Evolutionary methods
* Actual method / paper

---

## GPU Kernels

* Complexity of GPU normally hidden
  + PyTorch, Keras, JAX, TensorFlow
* But sometimes the details matter 
  + DeepSeek; FlashAttention; NeRFs; MAMBA
  + == Writing CUDA (or equivalent)

## Speed-ups Available

* [Excellent Blog Post](https://siboehm.com/articles/22/CUDA-MMM) on DIY Matrix Multiply
  + NB: Doesn't use the [Tensor Cores...](https://github.com/andylolu2/simpleGEMM/blob/master/gemm.cuh)

| Step | Method | GFLOPs/sec |
| ---- | ------ | ---------: |
| 1 | Naïve approach | 309 |
| 5 | 2D Block Tiling | 15972 |
| 10 | Warptiling | 21779 |
| | cuBLAS library | 23250 |
| | | |

;https://www.reddit.com/r/MachineLearning/comments/1cqhsln/p_simplegemm_fast_and_minimal_tensor_core_matrix/

## AMD GPU kernels

[<img width="800" src="img/AMD-Developer-Challenge_1008x416.jpg" alt="AMD Developer Challenge">](https://www.datamonsters.com/amd-developer-challenge-2025)

* Deadlines: 
  + Registration=2025-05-01, Competition=2025-06-02
* `FP8 GEMM` · `Fused MOE` · `MLA with ROPE`

; https://x.com/pavel_4_ai/status/1915039361655083223
;   AMD software is improving rapidly
;   CUDA isn't a moat forever, but Nvidia is building new ones with the Python DSL, Dynamo, and more
;   Meanwhile Nvidia hardware advantage is huge this year, but perf/TCO of 355X has attracted some customers
;   MI450X is actually competitive with Rubin

<!--
--

## Big Question

* Can `Gemini Pro 2.5` write AMD GPU kernels?
 + Short answer: YES
 + Longer answer: YES, but how competitive is it?
* Next Question: 
 + How can we automate code optimisation?
-->
<!--
--

## Key Factor : Better LLMs

* Things LLMs can do:
  + Write / change programs
  + Compare methods / Compare outcomes
  + Create / combine instructions
  + Adding : Human notions of elegance & novelty
-->

---

## LLM approach to competition

* Goals:
  + Write high-performance FP8 AMD kernels
  + Use only LLM code capabilities (not human brain-power)
* Obstacles:
  + Very limited AMD documentation
  + Very few working example of AMD kernels
    - Particularly for low-precision
  + Can only run code via limited REST API:
    - Compilation & run-time errors
    - Short report about any numerical errors
    - Benchmark results = end-to-end timing (No profiling data)

## LLM flows

;* Idea:
+ Get Gemini Pro to write GPU kernels
  - ... based on known working ones
  - bug-fixing if necessary
+ Use benchmarks (+Flash) to plan next iteration
+ Use Gemini Flash to suggest experiments
  - Choose which experiments to do
+ ... LOOP

## LLM 'Tricks'

* Gemini Pro is very effective
  + But relevant context is crucial
  + Evolution allows us to A/B test *everything*
* Gemini Flash decides which node to build on
* Gemini Flash then suggests experiments
  + And also estimate: 
    - how they might perform
    - how 'innovative' they are
  + We pick : Best, Least-Bad & Creative experiments

---

## Agentic Flows

## Stage 1 : Selection

## Stage 2 : Experiments

## Stage 3 : Coding

## End Results

* `amd-fp8-mm` competition timings:
  + Naïve HIP: ~5000μs  
  + PyTorch base-case: 850μs 
    - (uses optimised `fp16`)
  + Winning human entry: ~150μs
    - Code is now available 'in-context' *next time*
  + Final LLM-only entry: 450μs
    - Gemini may have over-complicated its solutions

---

## GPU Kernel Scientist

* Experiments done & written up ~5 days after MLSG
  + Accepted paper at ES-FoMo Workshop at ICML 2025!

---

## Wrap-Up

* LLMs can gain super-powers 
  + ... when employed in a purposeful system
* Building these systems is *wide open*
* "Just Do It!" can pay off

NB: MLSG wants to feature Your Talk!

## Link to Slides

[<img width="300" src="img/bit.ly_MLSG_2025-06_656x656.png" alt="Self-Improving Agents QR code"/>](https://bit.ly/MLSG_2025-06)

[https://bit.ly/MLSG_2025-06](https://bit.ly/MLSG_2025-06)

---

## Further Study

* Field is growing very rapidly
* Lots of different things can be done
* Easy to find novel methods / applications

## Deep Learning Foundations

* 3 week-days + online content
* Play with real models & Pick-a-Project
* Held online, Live Coding, Certificates
* Next run : Late August

## Vision (Advanced)
### Advanced Computer Vision with Deep Learning

* Advanced classification
* Other architectures (eg: U-Nets)
* Transformer-based vision
* Next run : Early September

## NLP (Advanced)
### Advanced NLP and Sequence Processing

* NLP (eg: Named Entity Recognition)
* Transformers : Theory and Practice
* Generative AI
* Next run : Late September

## AI in Production
### Building Real World A.I. Applications

* DIY : node-server + task-queue + python-ml
* TensorFlow Serving / PyTorch Serve
* TF Lite + TF.js : edge device models
* Distillation, pruning, quantisation, etc...
* Next run : Late October

## Deep Learning for PMs
### ( `= Foundations - code` `+ management` )
* Much more about 'big picture'
* Only a few code examples
* Project process standardised
* Next run : September

## Also...
* Unsupervised methods
* Time-series & Deep Learning
* Audio Processing (Sounds & Speech)

;--
;
;## QR code for Courses
;
;<img height="330" src="img/RDAI-courses-QRcode_172x165.png" alt="RDAI Courses QR code"/>

---

## Machine Learning SG MeetUp Group
* Next Meeting = ?July?-2025 (NB: ICML in Vancouver)
* Topic : TBA
* Typical Contents : 
 + Talk for people starting out
 + Something from the bleeding-edge
 + Lightning Talks
* [MeetUp.com / Machine-Learning-Singapore](https://www.meetup.com/Machine-Learning-Singapore/)

## Quick Poll
#### Show of hands

* What topic(s) would _compel_ you to come?
  + Stable-diffusion++ / Video / Gaussian Splatting
  + Robotics
  + AI for Education
  + LLMs for Science
  + Agents

---

# - Questions -

;`Handouts :` [`https://bit.ly/` `text-similarity-jan-2022`](https://bit.ly/text-similarity-jan-2022)