Long Short Term Memory

Resources

Video: Long Short-Term Memory (LSTM), Clearly Explained

What is LSTM

It is a type of Recurrent Neural Networks (RNN) architecture
It is designed to address the vanishing/exploding gradient problem (see Optimization Algorithms#^4a1546) in traditional RNNs, allowing it to effectively capture long-term dependencies in sequential data.

Long vs Short

why "long": It is a unique structure with memory cells that can store and access information over long periods
long-term memories = cell states: no weights, thus to avoid vanishing/exploding gradients
short-term memories = hidden states: with weights, just as vanilla RNNs

LSTM units

illustration figure 1

illustration figure 2

three key components

input gates
- activation function is tanh (-1 to 1)
- even though this part determines how we should update the long-term memory....
forget gates
- activation function is sigmoid (0 to 1)
- decides what percentage of the long-term memory is remembered
output gates