[MLE] Artificial Neural Network Training

十月 30, 2017

Overview

Error Functions
Basic Linear Algebra
Singular Value Decomposition
Gradient Descent
Backpropagation
Deep Learning

Error Functions

In order to optimise the performance of ANNs an error function on the training set must be minimised
This is done by adjusting:

Weights connecting nodes
Network Architecture
Parameters of non-linear functions h(a)

Backpropagation

Used to calculate derivatives of error function efficiently
Error propagate backwards layer by layer

Iterative minimisation of error function:

Calculate derivative of error function with respect to weights
Derivatives used to adjust weights

That’s the way we do backpropagation, but after get the derivatives, how do we update our weights?
Here is a graph I found on the internet:
enter image description here

In the lecture, it introduces as follows
enter image description here

Basic Linear Algebra

Matrix Determinant

Used in many calculations, e.g.
- matrix inversion
- singularity testing(singular iff |A| = 0)
- det(A) = |A|

Eigenvalues

Given an invertible matrix M, an eigenvalue equation can be found in terms of a set of orthogonal vectors \(v_i\) and scalars \(\lambda_i\) such that \(Mv_i = \lambda_iv_i\)
Eigenvalues are found by solving the characteristic equation: \(| A - \lambda I| = 0\)
enter image description here

enter image description here

Jacobian and Hessian

enter image description here

Ans:13 and 33**

When doing BP, how to calculate gradient of error function

enter image description here

Regularization

We always add regularization in our neural network calculation
In CS231, we first know regularization from this slide:
enter image description here

Why use regularization

penalise bad weights
avoid overfitting
early stopping
We use regularization to penalise large weights and unbalanced weights.
Regularization is a technique used in an attempt to solve the overfitting problem in statistical models.
And you might ask: OK I have everything now. How can I tune in the regularization term \(\lambda\)?
One possible answer is to use cross-validation: you divide your training data, you train your model for a fixed value of \(\lambda\) and test it on the remaining subsets and repeat this procedure while varying \(\lambda\). Then you select the best \(\lambda\) that minimizes your loss function.
Also, it is a way of early stopping since the test error will be steady.

搜索此博客

MikeChen's Blog

[MLE] Artificial Neural Network Training

Overview

Error Functions

Backpropagation

Basic Linear Algebra

Eigenvalues

Jacobian and Hessian

When doing BP, how to calculate gradient of error function

Regularization

Why use regularization

评论

发表评论

此博客中的热门博文

[MLE]Decision Trees

[AIM] MetaHeuristics

[MLE] Linear Classification