博文

目前显示的是 十月, 2017的博文

[MLE] Artificial Neural Network Training

图片
Overview Error Functions Basic Linear Algebra Singular Value Decomposition Gradient Descent Backpropagation Deep Learning Error Functions In order to optimise the performance of ANNs an error function on the training set must be minimised This is done by adjusting: Weights connecting nodes Network Architecture Parameters of non-linear functions h(a) Backpropagation Used to calculate derivatives of error function efficiently Error propagate backwards layer by layer Iterative minimisation of error function: Calculate derivative of error function with respect to weights Derivatives used to adjust weights That’s the way we do backpropagation, but after get the derivatives, how do we update our weights? Here is a graph I found on the internet: In the lecture, it introduces as follows Basic Linear Algebra Matrix Determinant Used in many calculations, e.g. matrix inversion singularity testing(singular iff |A| = 0) det(A) = |A| Eigenvalues Given an

[MLE] Artificial Neural Networks

图片
Overview Biological Neural Networks Cell Topology : Input, Output and Hidden Layers Functional Description Error Functions In the previous blog(lectures), we considered models for regression and classification that comprised linear combinations of fixed basis functions. We saw that such models have useful analytical and computational properties but that their practical applicability was limited by the curse of dimensionality. In order to apply such models to large- scale problems, it is necessary to adapt the basis functions to the data. Support vector machines (SVMs), address this by first defining basis functions that are centred on the training data points and then selecting a subset of these during training. One advantage of SVMs is that, although the training involves nonlinear optimization, the objective function is convex, and so the solution of the optimization problem is relatively straightforward. The number of basis functions in the resulting models is generally muc

[MLE]Decision Trees

图片
Overview Decision Trees Linear vs non-linear classifiers Entropy C4.5 Random Forests Some fundamental concepts The process of selecting a specific model, given a new input x, can be described by a sequential decision making process corresponding to the traversal of a binary tree (one that splits into two branches at each node). Here we focus on a particular tree-based framework called classification and regression trees Basic Decision Trees Decision trees apply a sequence of linear decisions, that often depend on only a single variable at a time. Such trees partition the input space into cuboid regions, gradually refining the level of detail of a decision until a leaf node has been reached, which provides the final predicted label As the above figure indicates, we should follow the rules for tree classification: Start at root node Evaluate relevant attribute of that node Follow branch according to attribute evaluation to descendent node Descendent node can then be co

[MLE] Linear Classification

图片
Linear Classification Discriminant functions Least squares for classification Fisher’s linear discriminant K-Nearest Neighbour(KNN) Classification What’s decision boundaries/linearly separable? The goal in classification is to take an input vector x and to assign it to one of K discrete classes \(C_k\) where k = 1,…,K. In the most common scenario, the classes are taken to be disjoint, so that each input is assigned to one and only one class. The input space is thereby divided into decision regions whose boundaries are called decision boundaries or decision surfaces. In this blog, we consider linear models for classification , by which we mean that the decision surfaces are linear functions of the input vector x and hence are defined by (D-1) -dimensional hyperplanes within the D-dimensional input space. For example: in 3D input space, the decision boundary will be a 2D surface What is Linear Separability Linearly separable data: Datasets whose classes can be separated by