[MLE] W2 Computing Parameters Analytically
Computing Parameters Analytically
Normal Equation
We have introduced a way of computing our parameter \(\theta\) – Gradient DescentBut in Gradient Descent, we need to compute it by iteration, and it is obviously very complicated. In some cases, we still need feature scaling and mean normalization.
Now there is a analytically way of computing parameter – Normal Equation.
In the “Normal Equation” method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:
\(θ=(X^TX)^{−1}X^Ty\)
There is no need to do feature scaling with the normal equation.
In Octave, use
pinv(x'*x)*x'*y
Gradient Descent | Normal Equation |
---|---|
Need to choose alpha | No need to choose alpha |
Need lots of iterations | No need to iterate |
\(O(kn^2)\) | \(O(n^3)\) Inverse needs \(O(n^3)\) and \(X^TX\) needs \(O(n^2)\) |
Works well n is large | slow if n is large |
Normal Equation Noninvertibility
When implementing the normal equation in octave we want to use the ‘pinv’ function rather than ‘inv.’ The ‘pinv’ function will give you a value of θ even if \(X^TX\) is not invertible.If \(X^TX\) is noninvertible, the common causes might be having :
- Redundant features, where two features are very closely related (i.e. they are linearly dependent)
- for example size in \(foot^2\) and size in \(m^2\)
- Too many features (e.g. m ≤ n). In this case, delete some features or use “regularization” (to be explained in a later lesson).
Great article...
回复删除Unfortunately, I do not know about such things, which is why I prefer to use proven solutions that are provided by external companies. One such solution is certainly https://grapeup.com/services/platform-ops-and-support/ where I can be sure that the platform I use will be adapted directly to my needs.
回复删除