Recurrent Neural Networks (RNNs LSTMs GRUs) Cheat Sheet

Back to AI and Machine Learning

Recurrent Neural Networks (RNNs) are a class of neural networks designed specifically for processing sequential data where order and temporal dependencies matter. Unlike feedforward networks that process inputs independently, RNNs maintain an internal hidden state (memory) that gets updated at each time step, allowing the network to capture patterns across sequences of varying lengths. This architecture excels at tasks like language modeling, machine translation, time series forecasting, and speech recognition. The core challenge that motivated LSTM and GRU variants is the vanishing gradient problem—during backpropagation through time (BPTT), gradients shrink exponentially when propagated backward through many time steps, making it nearly impossible for vanilla RNNs to learn long-range dependencies beyond 5-10 steps. Modern gated architectures (LSTM and GRU) solve this by introducing learnable gates that regulate information flow and maintain stable gradient paths through hundreds of time steps.

Back to AI and Machine Learning