[MLE] Linear Regression

Table of Content

Linear Regression with one variable

So what is univariate linear regression? Let’s first look at one slide in the class:
enter image description here
In my own words, let’s first imagine what is linear regression problem?
House price prediction!
Yes, it is a typical regression problem, and if we predict the house using linear algebra, like y = ax + b, it can be a linear regression problem.
To apply our learning algorithm, we need to predict a function of house price first, and adjust it to resemble the actual function(training set). Thus, the h(x) is what we called hypothesis function, it is a hypothesis. And what we do is to modify the intrinsic parameters w or you can just say modify weights.
Now we need to find a way to modify w so that our hypothesis function is more similar to the actual function, which comes to a new term cost function

Cost function

enter image description here
Cost function, in brief, is the sum of the difference between all training examples(points) and its predicted value.
enter image description here
For example, on the above image, when size is 750, the actual price is 200 but the predicted value is 150. Therefore, the cost or the difference is 50 in this point, and we sum all these cost.

Tricky point, why we have a \(\frac{1}{2n}\) before our \(\sum\) (sum)?

\(\frac{1}{n}\) is because we want to normalize them, which indicates that cost function is independent from training example size
\(\frac{1}{2}\) is because we will use Gradient Descent to calculate the minimal value of our cost, and when using partial derivatives, the square will times this \(\frac{1}{2}\) for easier calculation

Other cost functions

Well, this cost function is called L2-norm. There are also L1-norm but I think it is worse.
| enter image description here

Multivariate Linear Regression

In its simplest form, multivariate regression is simply the linear sum of the multiplications of each of the features with their corresponding weight term:
Multivariate linear regression with first-order polynomial:
\(yˆ=h(x,w)=w_0 +w_1x_1 +...+w_jx_j +...+w_dx_d\)
Higher-order polynomial:
enter image description here
Actually, I think the explanation by Andrew Ng in Coursera is better, See here

Gradient Descent

Cool, here comes to my favorite gradient descent, it is all about math to calculate the minima of cost function
We want to minimise J(w0, w1, … wk)
  • First start with some initial values for w0, w1, …wk
  • Keep updating W to reduce J(w0, w1, … wk), hoping to end in the minimum
enter image description here
Take care! This simultaneous updates, or that would be wrong because you use the updated w0 to calculate new temp1
enter image description here

评论

此博客中的热门博文

[MLE]Decision Trees

[AIM] MetaHeuristics

[MLE] Linear Classification