Normal Equation or Gradient Descent ? 🐱‍🐉🐱‍💻

Sri Vigneshwar DJ
3 min readJan 10, 2021

What a costly world !!!, Even to implement Simple Linear Regression we need to think about the cost. Yes, Cost Function !! ❤

In this blog, let us understand when to use the Normal Equation and when to use Gradient Descent?

Gradient Descent:

Gradient Descent is an optimization algorithm that can be used in any machine learning problem, it is used to find the value of the parameter(coefficient) of the function that minimizes the cost. In simple terms, Gradient Descent reduces the cost by choosing theta(coefficient).

  1. Choosing α — learning rate is very important (if α is small it may converge, if large it will over shoot).
  2. Iteration is required, setting θ’s value in each and every iteration.
Cost function vs Iteration

3. Feature Scaling is important

Normal Equation:

Normal equation is another way to choose θ without gradient descent. Is approach is an analytical approach to calculate θ with X (independent) and y(dependent) variable.

Formula Normal Equation
  1. No need to choose α — learning rate
  2. Iteration is not required
  3. Feature scaling is not required too, to some extent

Let us understand Normal equation first,

θ : hypothesis parameters that define it the best.
X : Input feature value of each instance.
Y : Output value of each instance.

We are given data with multiple features,

Hypothesis : n — features

here x0 = 1

To find the hypothesis,

Fig 1

Representing cost function in vector form:

vector form

We know that form Fig 1, we can replace h(θ) with θ transpose.

Transpose
x(i-j) combined equation for multiple feature.

So when to use what?

When you given dataset with more features between (1–1000) we can use Normal equation to find the optimal θ value. If the number of features exceeds we need to go for Gradient Descent.

--

--