Perplexity

wojciech achtelik
Wojciech Achtelik
AI Engineer Lead
July 4, 2025
Glossary Category
LLM

Perplexity is a fundamental evaluation metric for language models that measures how well a model predicts a sample of text, with lower values indicating better predictive performance and higher language modeling quality. This metric quantifies the average uncertainty or surprise a model experiences when predicting each token in a sequence, calculated as the exponential of the cross-entropy loss. Perplexity provides an intuitive interpretation where a model with perplexity of N is as uncertain as if it were randomly choosing between N equally likely tokens at each step. The metric serves as a standard benchmark for comparing language model performance across different architectures, training procedures, and datasets. Advanced perplexity calculations incorporate techniques like length normalization, out-of-vocabulary handling, and domain-specific evaluation to provide more accurate assessments. While perplexity correlates strongly with model quality, it may not fully capture performance on downstream tasks, necessitating complementary evaluation methods for comprehensive model assessment.