Gradient Descent vs Stochastic Gradient Descent

Introduction :

Gradient Descent and Stochastic Gradient Descent , both are optimization techniques to reduce the cost function for a machine learning algorithm. So, in a nutshell both are algorithms which find the optimal coefficients and bias for function (f).

The Difference :

Image result for gradient descent

Other than the word “Stochastic” there is one difference between both optimizing techniques. The word “stochastic” basically means random distribution, that may be analysed statistically.

In Gradient Descent algorithm, the complete data set is processed at one shot to calculate the value of coefficients and bias. The whole data is processed and only then the equivalent values of coefficients and bias is updated.

In Stochastic Gradient Descent (SGD), a random sample of data is selected from the entire data set and is processed. After which the coefficients and bias is updated. The selection of small samples really helps in optimizing the speed of processing and also helps in avoiding getting stuck in a local minima for a complex function. When we have huge amount of data , it will make a lot of sense to optimize the speed and carefully consume the CPU resources.

The drawback of SGD is that it can be noisy sometimes, since only one sample is chosen per iteration. This is where we can opt for a mini-batch SGD where 10 to 10000 samples can be chosen at random which can significantly reduce noise in our optimization process.

Leave a comment