博文

目前显示的是 六月, 2017的博文

[CS231] Neural Networks

图片
Neural Network Table of Contents: Quick intro without brain analogies Modeling one neuron Biological motivation and connections Single neuron as a linear classifier Commonly used activation functions Neural Network architectures Layer-wise organization Summary Additional references Quick intro It is possible to introduce neural networks without appealing to brain analogies. In the section on linear classification we computed scores for different visual categories given the image using the formula \( s = W x \), where \(W\) was a matrix and \(x\) was an input column vector containing all pixel data of the image. In the case of CIFAR-10, \(x\) is a [3072x1] column vector, and \(W\) is a [10x3072] matrix, so that the output scores is a vector of 10 class scores. An example neural network would instead compute \( s = W_2 \max(0, W_1 x) \). Here, \(W_1\) could be, for example, a [100x3072] matrix transforming the image into a 100-dimensional intermediate vector. The funct

[MLE] W3 Classification Problem

图片
Classification and Representation Classification To attempt classification, one method is to use linear regression and map all predictions greater than 0.5 as a 1 and all less than 0.5 as a 0. However, this method doesn’t work well because classification is not actually a linear function. The classification problem is just like the regression problem, except that the values y we now want to predict take on only a small number of discrete values. For now, we will focus on the binary classification problem in which y can take on only two values, 0 and 1. (Most of what we say here will also generalize to the multiple-class case.) For instance, if we are trying to build a spam classifier for email, then x(i) may be some features of a piece of email, and y may be 1 if it is a piece of spam mail, and 0 otherwise. Hence, y∈{0,1}. 0 is also called the negative class, and 1 the positive class, and they are sometimes also denoted by the symbols “-” and “+.” Given x(i), the corresponding y(

[CS231] Backpropagation

图片
Table of Contents: Introduction Simple expressions, interpreting the gradient Compound expressions, chain rule, backpropagation Intuitive understanding of backpropagation Modularity: Sigmoid example Patterns in backward flow Gradients for vectorized operations Summary Introduction Motivation . In this section we will develop expertise with an intuitive understanding of backpropagation , which is a way of computing gradients of expressions through recursive application of chain rule . Understanding of this process and its subtleties is critical for you to understand, and effectively develop, design and debug Neural Networks. Problem statement . The core problem studied in this section is as follows: We are given some function \(f(x)\) where \(x\) is a vector of inputs and we are interested in computing the gradient of \(f\) at \(x\) (i.e. \(\nabla f(x)\) ). Motivation . Recall that the primary reason we are interested in this problem is that in the specific case of Neural

[CS231] Optimisatioon

图片
Table of Contents: Introduction Optimization Strategy #1: Random Search Strategy #2: Random Local Search Strategy #3: Following the gradient Computing the gradient Numerically with finite differences Analytically with calculus Gradient descent Summary Introduction In the previous section we introduced two key components in context of the image classification task: A (parameterized) score function mapping the raw image pixels to class scores (e.g. a linear function) A loss function that measured the quality of a particular set of parameters based on how well the induced scores agreed with the ground truth labels in the training data. We saw that there are many ways and versions of this (e.g. Softmax/SVM). Concretely, recall that the linear function had the form \( f(x_i, W) = W x_i \) and the SVM we developed was formulated as: \[ L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)_j - f(x_i; W)_{y_i} + 1) \right] + \alpha R(W) \] We saw that a