[MLE] Artificial Neural Network Training

Overview

  • Error Functions
  • Basic Linear Algebra
  • Singular Value Decomposition
  • Gradient Descent
  • Backpropagation
  • Deep Learning

Error Functions

In order to optimise the performance of ANNs an error function on the training set must be minimised
This is done by adjusting:

  • Weights connecting nodes
  • Network Architecture
  • Parameters of non-linear functions h(a)

Backpropagation

  • Used to calculate derivatives of error function efficiently
  • Error propagate backwards layer by layer

Iterative minimisation of error function:

  1. Calculate derivative of error function with respect to weights
  2. Derivatives used to adjust weights
    enter image description here

That’s the way we do backpropagation, but after get the derivatives, how do we update our weights?
Here is a graph I found on the internet:
enter image description here

In the lecture, it introduces as follows
enter image description here

Basic Linear Algebra

Matrix Determinant

  • Used in many calculations, e.g.
    • matrix inversion
    • singularity testing(singular iff |A| = 0)
    • det(A) = |A|
    • enter image description here

Eigenvalues

Given an invertible matrix M, an eigenvalue equation can be found in terms of a set of orthogonal vectors \(v_i\) and scalars \(\lambda_i\) such that \(Mv_i = \lambda_iv_i\)
Eigenvalues are found by solving the characteristic equation: \(| A - \lambda I| = 0\)
enter image description here

enter image description here

Jacobian and Hessian

enter image description here
enter image description here
enter image description here
Ans:13 and 33**
enter image description here

When doing BP, how to calculate gradient of error function

enter image description here

Regularization

We always add regularization in our neural network calculation
In CS231, we first know regularization from this slide:
enter image description here

Why use regularization

  • penalise bad weights
  • avoid overfitting
  • early stopping
    We use regularization to penalise large weights and unbalanced weights.
    Regularization is a technique used in an attempt to solve the overfitting problem in statistical models.
    And you might ask: OK I have everything now. How can I tune in the regularization term \(\lambda\)?
    One possible answer is to use cross-validation: you divide your training data, you train your model for a fixed value of \(\lambda\) and test it on the remaining subsets and repeat this procedure while varying \(\lambda\). Then you select the best \(\lambda\) that minimizes your loss function.
    Also, it is a way of early stopping since the test error will be steady.

评论

此博客中的热门博文

[MLE] W2 Multivariate linear regression

[MLE] W1 Introduction

[AIM] MetaHeuristics