Sematic Search

Antoni Kozelski
CEO & Co-founder
July 2, 2025
Glossary Category

Sematic Search is an information-retrieval technique that ranks results by meaning rather than exact keyword match. It turns queries and documents into dense vectors—floating-point arrays—using language-model embeddings such as Sentence-BERT or OpenAI’s text-embedding-3-large. A vector database (e.g., Chroma, Pinecone, Milvus) computes cosine or dot-product similarity to surface conceptually related items, letting “apple health benefits” retrieve articles on fruit nutrition even if the word apple appears only once. Modern pipelines add a reranker cross-encoder for precision and may combine BM25 keywords with vector scores in a hybrid search. In large-language-model apps, Sematic Search is the retrieval layer in Retrieval-Augmented Generation (RAG), grounding answers with the most relevant passages before generation. Benefits include higher recall, language-agnostic queries, and resilience to synonyms; challenges involve embedding drift, vector storage cost, and the need for continuous evaluation to catch domain gaps.