Gradient Descent

What does Gradient Descent do

It minimizes cost function using derivatives by proceeding in epochs (an epoch consists of using the training set entirely to update each parameter).

Adaptive methods

The adaptive methods in gradient descent can automatically adjust the learning rate.

Gradient Descent in Neural Networks

Gradient computation

Challenges and Solutions

Vanishing / Exploding gradients

Challenges

Solutions: Initialization (Starting Point for Gradient Descent)

Optimization is too slow

Challenges

Solutions

Variants of Gradient Descent

Variants Outline:

Batch gradient descent

Stochastic gradient descent (SGD)

Batch gradient descent vs. Stochastic gradient descent

Mini-batch gradient descent

Exponentially weighted averages

(Exponentially weighted moving averages)

Momentum

v(t)=γv(t1)+learning rategradientweight(t)=weight(t1)v(t)

where

RMSprop (Root Mean Square Prop)

Adam (Adaptive Moment Estimation)

Learning rate decay