[MLE] W2 Computing Parameters Analytically

Computing Parameters Analytically

Normal Equation

We have introduced a way of computing our parameter \(\theta\) – Gradient Descent
But in Gradient Descent, we need to compute it by iteration, and it is obviously very complicated. In some cases, we still need feature scaling and mean normalization.
Now there is a analytically way of computing parameter – Normal Equation.
In the “Normal Equation” method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:
\(θ=(X^TX)^{−1}X^Ty\)
There is no need to do feature scaling with the normal equation.
In Octave, use pinv(x'*x)*x'*y
Gradient Descent Normal Equation
Need to choose alpha No need to choose alpha
Need lots of iterations No need to iterate
\(O(kn^2)\) \(O(n^3)\) Inverse needs \(O(n^3)\) and \(X^TX\) needs \(O(n^2)\)
Works well n is large slow if n is large
With the normal equation, computing the inversion has complexity \(O(n^3)\). So if we have a very large number of features, the normal equation will be slow. In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.

Normal Equation Noninvertibility

When implementing the normal equation in octave we want to use the ‘pinv’ function rather than ‘inv.’ The ‘pinv’ function will give you a value of θ even if \(X^TX\) is not invertible.
If \(X^TX\) is noninvertible, the common causes might be having :
  • Redundant features, where two features are very closely related (i.e. they are linearly dependent)
    • for example size in \(foot^2\) and size in \(m^2\)
  • Too many features (e.g. m ≤ n). In this case, delete some features or use “regularization” (to be explained in a later lesson).
Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.

评论

  1. Unfortunately, I do not know about such things, which is why I prefer to use proven solutions that are provided by external companies. One such solution is certainly https://grapeup.com/services/platform-ops-and-support/ where I can be sure that the platform I use will be adapted directly to my needs.

    回复删除

发表评论

此博客中的热门博文

[MLE] Linear Classification

[AIM] MetaHeuristics

[CS231] Neural Networks