Word Embedding

wojciech achtelik
Wojciech Achtelik
AI Engineer Lead
July 4, 2025
Glossary Category
LLM

Word embedding is a dense vector representation technique that maps words or tokens into continuous numerical vectors within a high-dimensional space, where semantically similar words are positioned closer together. These mathematical representations capture semantic relationships, syntactic patterns, and contextual meanings by converting discrete vocabulary items into dense arrays of real numbers, typically ranging from 50 to 1000 dimensions. Popular word embedding methods include Word2Vec, GloVe, and FastText, which learn these representations through neural networks trained on large text corpora. Modern transformer-based models like BERT and GPT generate contextual embeddings that vary based on surrounding words, unlike static embeddings that assign fixed vectors. Word embeddings serve as foundational input layers for natural language processing tasks, enabling machine learning models to understand linguistic relationships, perform semantic similarity calculations, and process text data numerically for downstream applications like sentiment analysis, machine translation, and information retrieval.