Key Components of Deep Learning

Overview of key components

= quantifies how well the neural network's predictions match the expected outputs.

Common examples (see Cost Functions)
- Cross-entropy loss: for classification problems
- Mean squared error (MSE): for regression tasks
- Contrastive or triplet loss: for representation learning or similarity-based tasks
Often augmented with regularization terms to prevent overfitting (Cost Functions#Cost function with regularization)

= dictates how the model updates its parameters (weights and biases) to minimize the objective function.

The optimizer plays a critical role in the convergence speed and stability of training.
It typically involves:
- Gradient computation: via Artificial Neural Networks#How it works
- Parameter updates: using optimizers such as Adam (Gradient Descent#Variants of Gradient Descent)
The learning rule is sensitive to:
- Learning rate (step size)
- Momentum (Gradient Descent#Momentum)
- Batch size

= layout and connectivity of the network — how many layers, how many neurons per layer, and how the layers are connected.

Architecture	Input Type	Handles Sequence?	Task Type	Supervised?	Typical Use Cases
Feedforward Neural Networks (FNNs) (simple multi-layer perceptrons)	Tabular / vector data	No	Discriminative (classification, regression)	Yes	Credit scoring, house price prediction
Convolutional Neural Networks (CNN)	Spatial (images, videos)	No	Discriminative / Generative	Yes / (some unsupervised)	Image classification, segmentation, image generation
Recurrent Neural Networks (RNN)	Sequential (text, time series)	Yes	Discriminative	Yes	Sentiment analysis, language modeling, stock prediction
Transformer	Sequential (text, audio)	Yes (via attention)	Discriminative & Generative	Yes / No	Translation, summarization, protein folding, chatbots
Autoencoders	Any (images, text, signals)	No / Yes (with RNNs)	Generative / Feature Learning	No	Denoising, anomaly detection, dimensionality reduction
Variational Autoencoders (VAE)	Any (images, text, signals)	No / Yes (with RNNs)	Generative / Feature Learning / Probabilistic Modeling	No	Generative modeling, smooth latent space, data generation
Generative Adversarial Networks (GANs)	Any (often images)	No / Yes	Generative	No	Image generation, data augmentation, synthetic data creation

Design choices include:
- Activation functions (ReLU, Sigmoid, Tanh, see Artificial Neural Networks#Activation functions)
- Depth and width
- Skip connections (e.g., in ResNets)
- Normalization layers (BatchNorm, LayerNorm)

= refers to how the network’s weights are set before training begins.

Good initialization helps avoid issues like:
- Vanishing/exploding gradients (Optimization Algorithms#^4a1546)
- Slow convergence
Common initialization strategies:
- Xavier/Glorot Initialization: for balanced signal flow.
- He Initialization: tailored for ReLU activations.
- Pretrained weights: from transfer learning or self-supervised learning.
Bias terms are often initialized to zero or small constants.

= defines the input data, labels, and context the model is exposed to during learning.

Key aspects:
- Type and quality of data (structured, unstructured, labeled, noisy).
- Supervised, unsupervised, or reinforcement learning setup.
- Batching and shuffling of data.
- Data augmentation to increase diversity and prevent overfitting.
- Train/validation/test splits to assess generalization.
In Reinforcement Learning, the "environment" refers to the interactive system where the agent takes actions and receives feedback (rewards).