Normal Equation or Gradient Descent ? 🐱🐉🐱💻
What a costly world !!!, Even to implement Simple Linear Regression we need to think about the cost. Yes, Cost Function !! ❤
In this blog, let us understand when to use the Normal Equation and when to use Gradient Descent?
Gradient Descent:
Gradient Descent is an optimization algorithm that can be used in any machine learning problem, it is used to find the value of the parameter(coefficient) of the function that minimizes the cost. In simple terms, Gradient Descent reduces the cost by choosing theta(coefficient).
- Choosing α — learning rate is very important (if α is small it may converge, if large it will over shoot).
- Iteration is required, setting θ’s value in each and every iteration.
3. Feature Scaling is important
Normal Equation:
Normal equation is another way to choose θ without gradient descent. Is approach is an analytical approach to calculate θ with X (independent) and y(dependent) variable.
- No need to choose α — learning rate
- Iteration is not required
- Feature scaling is not required too, to some extent
Let us understand Normal equation first,
θ : hypothesis parameters that define it the best.
X : Input feature value of each instance.
Y : Output value of each instance.
We are given data with multiple features,
here x0 = 1
To find the hypothesis,
Representing cost function in vector form:
We know that form Fig 1, we can replace h(θ) with θ transpose.
So when to use what?
When you given dataset with more features between (1–1000) we can use Normal equation to find the optimal θ value. If the number of features exceeds we need to go for Gradient Descent.