Layer Normalization

wojciech achtelik
Wojciech Achtelik
AI Engineer Lead
July 3, 2025
Glossary Category
LLM

Layer Normalization is a regularization technique that normalizes inputs across feature dimensions within individual layers of neural networks, stabilizing training dynamics and improving convergence speed in deep learning models. This method computes mean and variance statistics across all features for each training example independently, then scales and shifts the normalized values using learnable parameters. Layer normalization addresses internal covariate shift by maintaining consistent activation distributions throughout network depth, enabling stable gradient flow and faster training convergence. The technique proves particularly effective in transformer architectures, where it is applied before or after self-attention and feed-forward sublayers to enhance model stability. Advanced implementations incorporate adaptive normalization, root mean square normalization, and position-dependent scaling to optimize performance across diverse architectures. Layer normalization reduces sensitivity to initialization, enables higher learning rates, and improves generalization by preventing activation saturation and gradient vanishing problems in deep networks.