About Me
- Machine Intelligence / Startups / Finance
-
- Moved from NYC to Singapore in Sep-2013
- 2014 = 'fun' :
-
- Machine Learning, Deep Learning, NLP
- Robots, drones
- Since 2015 = 'serious' :: NLP + deep learning
-
The (DeepMind) Paper
- "A simple neural network module
for relational reasoning"
- Santoro, Raposo &
Barrett, Malinowski, Pascanu, Battaglia, Lillicrap
https://arxiv.org/
abs/1706.01427
Motivation
- CNNs : Well adapted to vision
- RNNs : Well adapted to sequences
- Need something for Relationships
... a reader piecing together evidence to predict the culprit in a murder-mystery novel ...
The Idea
- Allow network to create 'entity' nodes
- Combine pairs of nodes together
-
- ... with a relationship detector
- Call this combo a Relation-Network (RN)
The Reality
- All-the-Things → $n$ nodes
- Examine all pairs of nodes : $O(n^2)$
- Run $g_\theta()$ MLP over all combinations
- Sum up resulting vectors
- A final $f_\phi()$ MLP to give 'answer'
Experiments
- bAbI : Facebook's toy story setup
- CLEVR : Really tricky 3d questions
- Sort-of-CLEVR : Simplified 2d questions
- Dynamical physical systems : ??
bAbI Example
- 20 types of 'little stories'
- NB: dataset has versions
- But first paper scored 90%+ correct
1. John moved to the bedroom.
2. Mary grabbed the football there.
3. Sandra journeyed to the bedroom.
4. Sandra went back to the hallway.
5. Mary moved to the garden.
6. Mary journeyed to the office.
Q: Where is the football?
A: office 2 6
CLEVR Example
- Questions are sometimes tough
Sort-of-CLEVR
- Non-relational questions like:
- What is the shape of the red object? => Circle
- Is green object placed on the left side of the image? => yes
- Is orange object placed on the upside of the image? => no
- And relational questions:
- What is the shape of the object closest to the red object? => square
- What is the shape of the object furthest to the orange object? => circle
- How many objects have same shape with the blue object? => 3
Results
- bAbI : New SOTA (probably)
- CLVR : New SOTA (68.5% → 95.5%)
- Sort-of-CLVR : Much better than baselines
- Dynamical physical systems : Ok
Discussion
- Sounds like a graph, but is dense
- Entity selection via Attention?
- Why sum over $g_\theta()$?
Misleading Diagram
- Seems to show smartness that isn't there
Attention?
- Attention-is-All-You-Need?
- Top-$n$ is not GPU-friendly
Sum vs Max
$$f_\phi\left(\sum_{\forall (i,j)}{g_\theta(o_i, o_j)}\right)$$
- $\sum$ seems 'fair'
- Did they try $\max$?
Wrap-up
- Deep Learning papers are very readable
- Cutting edge experiment runs in <1 hour
- PyTorch is great for exploring new ideas
* Please add a star... *
8-week Deep Learning
Developer Course
- Plan : Start (sigh) in September
- Weekly 3-hour sessions will include :
-
- Instruction
- 3 structured projects
- 2 self-directed projects
- Cost: S$TBD
- Expect to work hard...