15 January 2015
How to handle 'discrete' things like Words?
5.7MM documents, 5.4Bn terms
→ 155k words, 500-D embedding
Ready-made Python module : Word2Vec
This is pretty surprising, IMHO
numpy
C/C++
or CUDA
(or OpenCL
)Function 'built up', then evaluated
import theano.tensor as T
x = T.matrix("x") # Declare Theano symbolic variables
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probability that target = 1
prediction = p_1 > 0.5 # The prediction thresholded
predict = theano.function(inputs=[x], outputs=prediction)
print predict( [0.1, .02, ... , -7.4, 3.2] )
Gradients come 'free'
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss fn
cost = xent.mean() + 0.01 * (w ** 2).sum() # Minimize this
gw, gb = T.grad(cost, [w, b]) # Compute the gradient of the cost
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
for i in xrange(training_steps):
pred, err = train(data_X, data_y)
Iteration mechanism built-in
Thin 'NN' layer on top of regular Theano
l_in = lasagne.layers.InputLayer(
shape=(batch_size, 1, input_width, input_height) )
l_conv2 = cuda_convnet.Conv2DCCLayer( l_in,
num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify )
l_pool2 = cuda_convnet.MaxPool2DCCLayer( l_conv2, ds=(2, 2))
l_hidden = lasagne.layers.DenseLayer( l_pool2, num_units=256,
nonlinearity=lasagne.nonlinearities.rectify )
l_dropout = lasagne.layers.DropoutLayer(l_hidden, p=0.5)
l_out = lasagne.layers.DenseLayer( l_dropout, num_units=output_dim,
nonlinearity=lasagne.nonlinearities.softmax )
gap_type
is softmax over 2+32 classes : gap.best
⇒ add a space