WaveNet(s)
TensorFlow & Deep Learning SG
23 January 2018
About Me
- Machine Intelligence / Startups / Finance
-
- Moved from NYC to Singapore in Sep-2013
- 2014 = 'fun' :
-
- Machine Learning, Deep Learning, NLP
- Robots, drones
- Since 2015 = 'serious' :: NLP + deep learning
-
- & Papers...
- & Dev Course...
Outline
- WaveNet v1
-
- Why the excitement?
- Key elements
- Implementation & Demo
- Fast-WaveNet
- Parallel-WaveNet
-
- Why the excitement?
- Key elements
WaveNet v1
- DeepMind splash in Sept-2016 :
-
Key Elements
- Produce audio samples from network
- CNN with dilation
- Sigmoid gate of Tanh units
- Side-chains
- Output of distributions
- Computational burden
Audio samples
from network
- Data output :
-
- 16 KHz rate (now 24KHz)
- 8-bit μ-law (now 16-bit PCM)
- Very long time-dependencies :
-
- Normal RNNs are limited to ~50 steps
- Word features are 1000s of steps
Regular CNNs
Look at the 'linear footprint'
Dilated CNNs
Look at the 'exponential footprint'
CNNs Pro/Con
- Advantages :
-
- Can have very long 'look back'
- Fast to train (see later)
- Disadvantages :
-
Sigmoid gate of Tanh units
Each CNN node has some complexity
- includes Gating and ResNet idea
Side-chains
Actual 'output' is fed from sideways connections
from all layers
Output of distributions
- Instead of raw audio :
-
- Output a complete distribution for each timestep
- 256x as much work
- ... seems crazy, but for the results ...
Computational burden
- Training is QUICK :
-
- All timesteps have known next training samples
- Inference / Running is SLOW :
-
- 1 sec of output = 1 minute of GPU
Fast WaveNet
- Optimise run-time :
-
- But intrinsic sequential nature remains...
Wrap-up
- WaveNet started out as very good but very expensive
- ... but that proved it was worth optimising
- Lots of opportunity for innovation
* Please add a star... *
Deep Learning
MeetUp Group
BONUS!
- Now FREE from Google via Kaggle (aka Colab) :
-
( Don't use for Mining )
Quick Poll
- Show of Hands :
-
- More in-depth on Eager Mode?
- Text to Speech race (Tacotron, DeepVoice, etc)?
- Speech to Text (ASR) game?
- CloudML?
- Latent space tricks?
- Knowledge base access?
Deep Learning
Back-to-Basics
8-week Deep Learning
Developer Course
- 25 September - 25-November
- Twice-Weekly 3-hour sessions included :
-
- Instruction
- Individual Projects
- Support by WSG
- Location : SGInnovate
- Status : FINISHED!
?-week Deep Learning
Developer Course
- Plan : Start 2018-Q1
- Sessions will include :
-
- Instruction
- Individual Projects
- Support by WSG (planned)
- Location : SGInnovate
- Status : TBA
Deep Learning : Beginner Course
- Dates + Cost : TBA ::
-
- Full day (week-end)
- Play with real models
- Get inspired!
- Pick-a-Project to do at home
- 1-on-1 support online
- Regroup on a week-night
- http://bit.ly/2zVXtRm