WaveNet(s)

TensorFlow & Deep Learning SG

Martin Andrews @ redcatlabs.com
Martin Andrews @ reddragon.ai

23 January 2018

About Me

  • Machine Intelligence / Startups / Finance
    • Moved from NYC to Singapore in Sep-2013
  • 2014 = 'fun' :
    • Machine Learning, Deep Learning, NLP
    • Robots, drones
  • Since 2015 = 'serious' :: NLP + deep learning
    • & Papers...
    • & Dev Course...

Outline

  • WaveNet v1
    • Why the excitement?
    • Key elements
    • Implementation & Demo
  • Fast-WaveNet
  • Parallel-WaveNet
    • Why the excitement?
    • Key elements

WaveNet v1

WaveNet v1 MOS

Key Elements

  • Produce audio samples from network
  • CNN with dilation
  • Sigmoid gate of Tanh units
  • Side-chains
  • Output of distributions
  • Computational burden

Audio samples
from network

  • Data output :
    • 16 KHz rate (now 24KHz)
    • 8-bit μ-law (now 16-bit PCM)
  • Very long time-dependencies :
    • Normal RNNs are limited to ~50 steps
    • Word features are 1000s of steps

Regular CNNs

Regular CNN

Look at the 'linear footprint'

Dilated CNNs

Dilated CNN

Look at the 'exponential footprint'

CNNs Pro/Con

  • Advantages :
    • Can have very long 'look back'
    • Fast to train (see later)
  • Disadvantages :
    • No 'next sample' scheme

Sigmoid gate of Tanh units

WaveNet Gating Unit

Each CNN node has some complexity
- includes Gating and ResNet idea

Side-chains

WaveNet v1 Side-Chain

Actual 'output' is fed from sideways connections
from all layers

Output of distributions

  • Instead of raw audio :
    • Output a complete distribution for each timestep
    • 256x as much work
    • ... seems crazy, but for the results ...

WaveNet v1 bins

Computational burden

  • Training is QUICK :
    • All timesteps have known next training samples
  • Inference / Running is SLOW :
    • 1 sec of output = 1 minute of GPU

WaveNet Sequential

Implementations

SG Implementation

  • Includes the latest hotness :
    • DataSet API for TFRecords streaming from disk
    • Keras model → Estimator
    • One Notebook end-to-end



github.com / mdda / deep-learning-workshop
/notebooks/2-CNNs/8-Speech/
SpeechSynthesis_MelToComplexSpectra.ipynb

Fast WaveNet


Fast WaveNet

Parallel WaveNet

Goal = Parallel

Parallel WaveNet Noise to Waveform

New Element

Parallel WaveNet student-teacher

Noise → Distribution → Sample → Distribution
(optimise for distributions being the same)

Visual Demo

Parallel WaveNet SpeedUp

Wrap-up

  • WaveNet started out as very good but very expensive
  • ... but that proved it was worth optimising
  • Lots of opportunity for innovation
GitHub - mdda

* Please add a star... *

Deep Learning
MeetUp Group

BONUS!

( Don't use for Mining )

Quick Poll

  • Show of Hands :
    • More in-depth on Eager Mode?
    • Text to Speech race (Tacotron, DeepVoice, etc)?
    • Speech to Text (ASR) game?
    • CloudML?
    • Latent space tricks?
    • Knowledge base access?

Deep Learning
Back-to-Basics

8-week Deep Learning
Developer Course

  • 25 September - 25-November
  • Twice-Weekly 3-hour sessions included :
    • Instruction
    • Individual Projects
    • Support by WSG
  • Location : SGInnovate
  • Status : FINISHED!

?-week Deep Learning
Developer Course

  • Plan : Start 2018-Q1
  • Sessions will include :
    • Instruction
    • Individual Projects
    • Support by WSG (planned)
  • Location : SGInnovate
  • Status : TBA

Deep Learning : Beginner Course

  • Dates + Cost : TBA ::
    • Full day (week-end)
    • Play with real models
    • Get inspired!
    • Pick-a-Project to do at home
    • 1-on-1 support online
    • Regroup on a week-night
  • http://bit.ly/2zVXtRm

- QUESTIONS -


Martin.Andrews @
RedCatLabs.com

Martin.Andrews @
RedDragon.AI


My blog : http://blog.mdda.net/

GitHub : mdda