[MLE] Deep Learning

Overview

In this blog, we will look at deep learning and know how CNN actually works.
Actually convolutional neural network is a kind of neural network. But unlike ANN, the feature extraction is quite different from ANN. In ANN, we calculate matrix production by each pixel, here we see it in a bigger view.

What is deep learning

Definition

  • Hierarchical organisation with more than one (non-linear) hidden layer in-between the input and the output variables.
  • Output of one layer is the input of the next layer

Methods

  • Deep Neural Networks
  • Convolutional Neural Networks
  • Restricted Boltzmann Machine
  • Recurrent Neural Networks
  • GAN/LSTM/R-CNN etc…

The process of CNN

I will use an example with some demonstration and graph to show the process and details of CNN.

For example, the input images in CIFAR-10 are an input volume of activations, and the volume has dimensions 32x32x3 (width, height, depth respectively). As we will soon see, the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner. Moreover, the final output layer would for CIFAR-10 have dimensions 1x1x10, because by the end of the ConvNet architecture we will reduce the full image into a single vector of class scores, arranged along the depth dimension.

Layers used to build ConvNets

As we described above, a simple ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as seen in regular Neural Networks). We will stack these layers to form a full ConvNet architecture.

Example Architecture:

We will go into more details below, but a simple ConvNet for CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more detail:

Input Layer

INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.

Convolutional Layer

CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.

enter image description here
Here, we use six \(5*5*3\) filters, then we get \(28*28*6\). But how we get it, what’s the relationship between filter size and strip?

How do we calculate?

enter image description here
Now we came up with a problem, when stride is 3, we can not go a fully comfortable filter
Actually, we always go on a zero pad!
enter image description here

Question!

Input volume 32*32*3, 10 5*5 filter with stride 1 and pad 2. Output volume size?
(32+2*2-5)/1+1 = 32 => 32*32*10
How many parameters?
enter image description here
You see, the parameter is lower than ANN. Image an ANN, the input is 300*300, the hidden layer has 128 neurons, therefore, that will be 300*300*128(A lot);
While for CNN, we just image input is 300*300, but 10 filter with size 5*5, the parameter is still 760.

How convolution actually work?

enter image description here
enter image description here

Relu Layer

RELU layer will apply an elementwise activation function, such
as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).

Pool Layer

POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
enter image description here

Fully-connected layer

FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

In this way, ConvNets transform the original image layer by layer from the original pixel values to the final class scores. Note that some layers contain parameters and other don’t. In particular, the CONV/FC layers perform transformations that are a function of not only the activations in the input volume, but also of the parameters (the weights and biases of the neurons). On the other hand, the RELU/POOL layers will implement a fixed function. The parameters in the CONV/FC layers will be trained with gradient descent so that the class scores that the ConvNet computes are consistent with the labels in the training set for each image.
enter image description here

Training Convolutional Neural Networks

enter image description here
enter image description here

The challenges of training

Underfitting: The network is trained only to a sub-optimal configuration

  • Variants of gradient descent
  • Sheer computational power
  • Relu

If you come up with underfitting, you can

  • use larger gradients
  • use momentum or nesterov acceleration to stop in the optima
  • Use powerful GPU(Cuda)
  • Pre training

Layer-wise greedy Pre-training

  • Often unsupervised: pick a random image and try to represent them
  • Greedy learning: each layer is optimised independently of the others
  • Large amount of data. Since not labelled, pick a lot from everywhere
  • Fine-tune with supervised data specific of the problem at hand. Do whole network at once

Overfitting: The network does not generalise well to new data

  • Standard regularisation
  • Pre-trained models
  • Drop-out
  • Check only few dimensionsenter image description here

Drop-out

Remember to turn off dropout/augmentations. When performing gradient check, remember to turn off any non-deterministic effects in the network, such as dropout, random data augmentations, etc. Otherwise these can clearly introduce huge errors when estimating the numerical gradient. The downside of turning off these effects is that you wouldn’t be gradient checking them

enter image description here

评论

此博客中的热门博文

[MLE] Linear Classification

[AIM] MetaHeuristics

[CS231] Neural Networks