Every retrieval-augmented chatbot, every “find me something similar” feature, and every recommendation that feels like it read your mind runs on the same quiet engine underneath: vector search.
The hard part was never the idea. It was keeping a separate vector database in sync with the data that actually changes in your warehouse, paying for infrastructure that drifted out of date the moment a row was updated, and bolting governance on after the fact.
Databricks vector search exists to remove that whole layer of glue work by building similarity search directly into the lakehouse where the data already lives.
This guide explains what Databricks vector search is, how an index is built and queried, the difference between Delta Sync and direct-access index types, how it compares to running a standalone vector database, and where it fits in a production retrieval-augmented generation pipeline.
One naming note before we start, because it trips people up in 2026. Databricks recently renamed this product to Databricks AI Search, but the index objects and the SDK still carry the Mosaic AI Vector Search name, and most teams still search for and say “vector search.” We use the terms interchangeably here, and the section on the rename covers what actually changed.
Watch on YouTube
Delivering Contextual Query Resolution Through an AI Support Agent
A look at a production AI agent that resolves queries against governed enterprise data, the kind of grounded retrieval a vector search index powers.
Key Takeaways Databricks vector search is a serverless similarity search engine built into the Data Intelligence Platform, so the index lives next to your governed Delta tables instead of in a separate system. An index is built in four stages, embed, index, sync, and query, and Databricks can compute the embeddings for you or let you bring your own. A Delta Sync index keeps itself current with the source Delta table automatically, while a direct-access index hands you the write path for streaming or custom embedding workflows. Hybrid search blends semantic vectors with keyword matching and reranking in one API call, and metadata filtering plus Unity Catalog governance keep results scoped and safe. Standard endpoints give tens-of-milliseconds latency in memory, while storage-optimized endpoints serve billions of vectors cheaply at hundreds of milliseconds. In 2026 Databricks renamed the product to Databricks AI Search, but the engine, the Mosaic AI Vector Search index objects, and the SDK are the same. Kanerika, a Databricks partner, builds governed, cost-controlled vector search and RAG retrieval layers that production AI applications can trust. What is Databricks vector search? Databricks vector search is a serverless similarity search engine built into the Databricks Data Intelligence Platform. It stores vector representations of your data, called embeddings, alongside their metadata, and lets an application search by meaning rather than by exact keyword. Ask it for the rows most similar to a query, and it returns the nearest neighbors in vector space in milliseconds, governed by the same Unity Catalog permissions that protect the rest of your tables.
An embedding is a list of numbers that captures the semantic content of a piece of text, an image, or another unstructured input. As the Azure Databricks AI Search reference puts it, vector search is a type of search optimized for retrieving embeddings, the mathematical representations of semantic content.
Two passages that mean roughly the same thing land close together in that numeric space even when they share no words. That is what lets vector search answer “show me documents about cutting cloud spend” with a passage titled “reducing compute costs.”
The whole product is designed so this capability sits next to your governed data instead of in a separate system you have to feed and reconcile.
The reason it matters on Databricks specifically is the lakehouse context. Your source data already lives in Delta tables under the Databricks lakehouse architecture , often described more broadly as a data lakehouse , your access policies already live in Unity Catalog, and your transformation jobs already run in Databricks Workflows .
Putting the vector index in the same platform means the index can track the table automatically, inherit its governance, and avoid the brittle export-and-load step that breaks the moment someone changes a schema.
According to the Databricks AI Search documentation , the index integrates directly with the platform’s governance and productivity tooling rather than sitting beside it.
How vector search works on Databricks, step by step The lifecycle from raw data to a ranked result has four stages. Understanding them is the difference between treating the product as a black box and tuning it for real workloads.
Each stage maps to a concrete object you create in the platform, and the managed path automates most of it.
Embed. A model converts each row of source text or each image into a vector. You can let Databricks compute these embeddings for you with an integrated foundation model, or supply your own precomputed embedding column if you already run an embedding model elsewhere in your generative AI tech stack .Index. Those vectors are written into a vector search index, a specialized structure that supports fast approximate nearest-neighbor lookups instead of scanning every vector one by one.Sync. With the managed option, the index subscribes to changes in the underlying Delta table, so inserts, updates, and deletes flow into the index without a separate job to babysit. This is the piece that removes the most operational pain.Query. An application sends a query, Databricks embeds it with the same model, and the index returns the closest matching rows with their metadata, ready to feed an LLM or a ranking layer.Teams that operate their own embedding models often manage them through machine learning operations practices and track versions in MLflow .
That managed sync path is the headline feature. It is what most people mean when they ask whether to put the index next to the jobs that load those tables.
The Databricks engine handles the embedding, the indexing, and the continuous refresh as one unit.
Case Study
Contextual Query Resolution With a Production AI Support Agent
Kanerika built an AI support agent that resolves user queries against governed enterprise data, the same grounded-retrieval pattern a vector search index feeds.
Read the Case Study → Delta Sync vs direct-access index: choosing an index type The single most important design decision is which index type to create, because it determines who owns the write path and how much Databricks automates for you. There are two, and a useful third comparison point is the standalone vector database you would otherwise run yourself.
A Delta Sync index is the managed option. You point it at a source Delta table, and Databricks keeps the index synchronized automatically as the table changes, optionally computing the embeddings on your behalf.
It is the lowest-effort path and the right default for most teams, because it eliminates the sync pipeline entirely and fits the kind of governed, self-maintaining estate that Unity Catalog is built to support.
A direct-access index, by contrast, hands you the write path. You insert, update, and delete vectors yourself through the API, which gives you precise control for streaming or custom embedding workflows but means you own the freshness logic that Delta Sync would otherwise handle.
The infographic above compares the two index types feature by feature against a self-managed database. The table below takes a different cut, mapping concrete workloads to the index type that fits, so you can match the choice to how your data actually moves.
Your workload Recommended index Why it fits RAG over docs in a Delta table Delta Sync Auto-syncs and can embed for you, so there is no pipeline to run Semantic search on governed tables Delta Sync Inherits Unity Catalog permissions with zero extra wiring Streaming ingest with custom embeddings Direct-access You own the write path and control exactly what lands when Embeddings produced outside Databricks Direct-access Lets you push precomputed vectors straight through the API Source data not yet in a Delta table Direct-access Works without a source table to sync from, until you land one
For embedding options, the official Databricks embedding options reference details when Databricks computes vectors for you versus when you bring your own, which is the second decision that follows naturally once you have picked an index type.
Embeddings, hybrid search, and metadata filtering Similarity alone is rarely enough for a production query. Real search blends three things:
Semantic vectors that capture meaning, so a query matches on intent rather than exact wording.Keyword matching that nails exact terms like product codes and error numbers.Structured filters that scope results to the right tenant, date range, or category.Databricks vector search supports all three in a single call, which is why teams stop stitching together separate systems for each.
Hybrid search is the most useful of these. Pure vector search can miss an exact identifier because embeddings smear precise tokens into approximate neighbors, while pure keyword search misses meaning.
Hybrid search runs both, combining semantic similarity with a traditional keyword score and reranking the merged results. A query for “error 0x80070005 on login” then finds both the document that names the code and the one that describes the same failure in plain language.
According to the Databricks AI Search product page , hybrid semantic and keyword search with built-in reranking is exposed through a single API.
Metadata filtering is the quiet workhorse. Because each vector carries its source columns, you can constrain a similarity search with structured predicates, returning only rows where the region is EMEA or the document is not archived.
This is what makes vector search safe in a multi-tenant application. It is governed by the same Unity Catalog policies that protect every other query, so a user only ever retrieves vectors they are already permitted to see.
That same governance posture extends to Databricks security controls and audit, which matters once retrieval touches regulated data. That governance link is one reason vector search pairs naturally with a broader Databricks Data Intelligence Platform rollout rather than a side deployment.
Endpoint types: standard vs storage-optimized An index has to run somewhere, and that somewhere is a vector search endpoint. Databricks offers two endpoint profiles, and the choice is a straightforward trade between latency and cost at scale. Picking the wrong one is one of the more expensive mistakes teams make, so it is worth a moment of deliberate thought rather than accepting the default.
Standard endpoints keep full-precision vectors in memory to deliver query latencies in the tens of milliseconds, which is what user-facing applications need when a person is waiting for an answer.
Storage-optimized endpoints separate storage from compute so you can serve billions of vectors at a fraction of the cost, with latencies in the hundreds of milliseconds. They suit very large corpora where cost matters more than instantaneous response.
The Databricks engineering blog on billion-scale search explains the decoupled storage design that makes the storage-optimized tier economical at that volume.
As a rule of thumb, start with a standard endpoint for any interactive RAG assistant or search box, and move to storage-optimized only when the corpus grows into the billions and your latency budget can absorb the extra hundreds of milliseconds. Mixing the two across workloads is common, the same way teams run separate compute for different jobs, and the layout decision should be backed by numbers, not guesswork.
Databricks vector search vs standalone vector databases The honest question every architect asks is whether to use the built-in option at all, or to reach for a dedicated vector database like Pinecone, Weaviate, or pgvector. The answer turns less on raw benchmark speed and more on where your data and governance already live, which is the same logic that decides most platform questions on the lakehouse.
A standalone vector database is a specialized system you operate separately. It can be excellent and highly tunable, but it sits outside your lakehouse.
That separation has a cost. You build and own a pipeline to export embeddings into it, you reconcile its contents with the source table every time data changes, and you reimplement access control because it does not know about Unity Catalog.
Databricks vector search trades a sliver of low-level tunability for the elimination of all of that integration and governance work, because the index already lives where the data and the policies do.
The comparison below frames the decision the way it actually gets made in practice.
Kanerika Service
Databricks Consulting and Implementation
Kanerika is a Databricks partner that designs, builds, and operates lakehouse platforms end to end, from embedding strategy and index types to governed, observable retrieval pipelines.
Explore Databricks Services Consideration Databricks vector search Standalone vector DB Sync with source data Built in via Delta Sync You build and own the pipeline Governance and access control Inherited from Unity Catalog Reimplemented separately Infrastructure to operate Serverless, none of your own A separate system to run Low-level index tuning Abstracted and managed Fine-grained and exposed Best when Your data lives on Databricks You need a portable, standalone store
For teams already standardizing on the platform, the calculus usually favors the built-in option, in the same way the wider Databricks vs Snowflake decision tends to follow where the rest of the estate sits rather than a single feature. The same is true when comparing it against a managed alternative such as Microsoft Fabric vs Databricks , where the deciding factor is usually the existing platform rather than the search engine alone.
Listen on Spotify
The Smarter Way to Access Enterprise Knowledge – With DokGPT
Building a RAG pipeline with Databricks vector search The dominant use case is retrieval-augmented generation, where vector search supplies the grounded context that keeps a language model honest. The pattern is consistent across the production systems we see, and it maps cleanly onto the four lifecycle stages above.
You chunk your source documents into passages, embed each chunk, and write them to a Delta Sync index so the retrieval corpus stays current as documents are added or revised.
At query time, the user’s question is embedded, the index returns the most relevant chunks, and those chunks are injected into the prompt so the model answers from your actual content instead of its training data.
This is what separates a grounded assistant from a confident guess. It is the backbone of most enterprise advanced RAG systems and the newer agentic RAG designs where an agent decides what to retrieve.
The choice of retrieval tooling here matters, which is why teams compare RAG tools before committing, and why grounded retrieval increasingly powers an AI chatbot for businesses rather than a brittle scripted bot.
Vector search is also the retrieval layer underneath Databricks’ own agent tooling, including Mosaic AI and Agent Bricks , which assemble retrieval, models, and evaluation into deployable agents.
If you are weighing retrieval against other grounding strategies, our breakdown of RAG vs fine-tuning covers when to retrieve and when to retrain, the MCP vs RAG comparison covers how retrieval coexists with tool-calling protocols, and RAG vs LLM explains why retrieval beats a model’s frozen memory for fresh enterprise facts.
Watch on YouTube
Why Databricks’ Platform Wins with 2025 Data Insights
Why enterprise data teams standardize on Databricks for engineering, analytics, and AI, and what that means for building retrieval on top of it.
The 2026 rename: Vector Search to Databricks AI Search If you are reading current documentation and getting confused, here is the cause. In 2026 Databricks rebranded the product from Databricks Vector Search to Databricks AI Search, positioning it as a broader search capability that spans semantic, keyword, and hybrid retrieval rather than vectors alone, in step with Databricks’ wider push into Databricks generative AI . The underlying engine, the index objects, and the Python SDK still use the Mosaic AI Vector Search naming, so you will see both names across the UI, the docs, and Terraform resources at the same time.
What actually changed is mostly framing plus the formalization of hybrid search and the two endpoint tiers as first-class features. The core mechanics, Delta Sync indexes, direct-access indexes, embedding options, and Unity Catalog governance, are the same ones described throughout this guide. For practical purposes, treat “Databricks vector search,” “Mosaic AI Vector Search,” and “Databricks AI Search” as the same product at different points on a naming timeline, and do not let a tutorial written under the old name send you looking for a feature that simply got renamed.
Production pitfalls to plan for Vector search looks simple in a demo and exposes its sharp edges in production, so it pays to plan for the failure modes before they find you. None of these are reasons to avoid the product; they are the operational realities of running semantic retrieval at scale, and getting ahead of them is mostly a matter of design discipline.
Chunking. How you split documents into passages quietly determines retrieval quality more than the index settings do. Chunks that are too large dilute the signal, and chunks that are too small lose context.Embedding model drift. If you ever change the embedding model, every vector in the index was produced by the old model and must be regenerated. Version your embedding model deliberately and treat a model swap as a full re-index.Sync lag and cost. Delta Sync is near real-time, not instant, so plan for a short window where the index trails the source. Right-size your endpoint so you are not paying for an in-memory standard endpoint to serve an archive a storage-optimized endpoint would serve far cheaper.These are the same discipline questions that show up in any Databricks performance optimization effort, and they reward measurement over guesswork.
How Kanerika helps you build production vector search Standing up a vector search index in a notebook takes an afternoon. Running it as a governed, cost-controlled, production retrieval layer that an enterprise can trust is a different exercise, and that is where most teams want a partner who has done it before. Kanerika is a Databricks partner that designs and operates lakehouse platforms end to end, from Delta table layout and embedding strategy to the governed, observable retrieval pipelines that grounded AI applications depend on.
Kanerika Service
Generative AI and RAG Engineering
Kanerika designs and ships production generative AI, from retrieval-augmented pipelines and vector search to evaluated, governed enterprise assistants.
Explore Generative AI Services We run vector search engagements as a staged path, not a single notebook task, so the retrieval layer holds up once real users and regulated data touch it.
Assess. We profile your real query logs and source tables to decide what actually belongs in the index, where retrieval will earn its keep, and which governance constraints apply before a single embedding is written.Design. We choose index types and endpoint tiers against real query patterns, defaulting new corpora to a Delta Sync index on a standard endpoint and only graduating to storage-optimized once the corpus crosses into the billions.Build. We stand up the chunking strategy, the embedding pipeline, and the index, pinning the embedding model version so a later swap is treated as a deliberate full re-index rather than a silent break.Govern. We wire Unity Catalog permissions and lineage through the retrieval layer, extending the same controls with our KAN governance suite (KANGovern, KANComply, KANGuard) so retrieval never returns a row a user should not see.Enable. We measure relevance, hand over runbooks, and set up the evaluation loop so your team can tune chunking and reranking without us in the room.This is the same grounded-retrieval pattern behind our document intelligence agent, DokGPT (now KlarityIQ), which is built on RAG over governed enterprise content.
For one investment-bank deployment, that retrieval layer delivered 43% faster information retrieval, a 35% reduction in manual review hours, and 100% role-based compliance, with access scoped by the same governance the index inherits.
Our broader RAG development and generative AI practice carries that pattern from a single search box to a fleet of governed agents.
The practitioner watch-outs are consistent. Right-size the endpoint early, version the embedding model deliberately, and treat chunking as a first-class design decision rather than an afterthought, because each one quietly decides whether retrieval is trusted in production.
If you want vector search built the right way the first time, we scope which data belongs in the index, set the index and endpoint configuration, and stand up the pipeline that keeps it fresh.
Frequently Asked Questions What is Databricks vector search? Databricks vector search is a serverless similarity search engine built into the Databricks Data Intelligence Platform. It stores vector embeddings of your data alongside their metadata and lets applications search by meaning rather than exact keywords, returning the nearest matching rows in milliseconds. Because it lives inside the lakehouse, the index can track your Delta tables automatically and inherit Unity Catalog governance, which removes the separate vector database and sync pipeline that bolt-on approaches require. In 2026 Databricks renamed the product to Databricks AI Search, but the underlying engine and the Mosaic AI Vector Search index objects are the same.
What is the difference between a Delta Sync index and a direct-access index? A Delta Sync index is the managed option. You point it at a source Delta table and Databricks keeps the index synchronized automatically as the table changes, optionally computing the embeddings for you. A direct-access index hands you the write path, so you insert, update, and delete vectors yourself through the API and own the freshness logic. Delta Sync is the right default for most retrieval and RAG workloads because it eliminates the sync pipeline, while a direct-access index suits streaming ingestion or custom embedding workflows where you need precise control over what gets written and when.
Is Databricks vector search the same as Databricks AI Search? Yes. In 2026 Databricks rebranded Databricks Vector Search to Databricks AI Search, positioning it as a broader search capability that spans semantic, keyword, and hybrid retrieval rather than vectors alone. The underlying engine, the index objects, and the Python SDK still carry the Mosaic AI Vector Search name, so you will see both names across the UI, the documentation, and Terraform resources at the same time. For practical purposes, treat Databricks vector search, Mosaic AI Vector Search, and Databricks AI Search as the same product at different points on a naming timeline.
How does Databricks vector search work for RAG? For retrieval-augmented generation, you chunk your source documents into passages, embed each chunk, and write them to a Delta Sync index so the retrieval corpus stays current as documents change. At query time the user’s question is embedded with the same model, the index returns the most relevant chunks, and those chunks are injected into the prompt so the language model answers from your actual content instead of its training data. This grounding is what separates a reliable assistant from a confident guess, and vector search is the retrieval layer underneath Databricks agent tooling such as Mosaic AI and Agent Bricks.
Does Databricks vector search support hybrid search? Yes. Hybrid search runs semantic vector similarity and traditional keyword matching together, then reranks the merged results, all exposed through a single API call. This matters because pure vector search can miss exact identifiers like product codes or error numbers, while pure keyword search misses meaning. Running both catches a query whether it names the precise token or describes the same idea in different words. You can also combine similarity with metadata filtering, scoping results by structured predicates such as region or status, all governed by the same Unity Catalog permissions as the rest of your data.
What is the difference between standard and storage-optimized endpoints? Standard endpoints keep full-precision vectors in memory to deliver query latencies in the tens of milliseconds, which is what interactive, user-facing applications need. Storage-optimized endpoints separate storage from compute so you can serve billions of vectors at a fraction of the cost, with latencies in the hundreds of milliseconds, which suits very large corpora where cost matters more than instantaneous response. As a rule of thumb, start with a standard endpoint for any interactive RAG assistant or search box, and move to storage-optimized only when the corpus grows into the billions and your latency budget can absorb the extra delay.
Should I use Databricks vector search or a standalone vector database? It depends on where your data and governance already live. A standalone vector database like Pinecone, Weaviate, or pgvector can be excellent and highly tunable, but it sits outside your lakehouse, so you build a pipeline to export embeddings into it, reconcile it with the source on every change, and reimplement access control. Databricks vector search trades a sliver of low-level tunability for eliminating all of that integration and governance work, because the index already lives where the data and the Unity Catalog policies do. For teams already standardizing on Databricks, the built-in option usually wins.
Can Databricks compute embeddings for me? Yes. With a Delta Sync index you can let Databricks generate embeddings using an integrated foundation model, so you only point the index at a text column and the platform handles vectorization and refresh. Alternatively, you can supply your own precomputed embedding column if you already run an embedding model elsewhere in your stack. If you ever change the embedding model, remember that every vector in the index was produced by the old model and must be regenerated, so version your embedding model deliberately and treat a model swap as a full re-index.