Home
Products

Intelligent Workflow Automation Platform
Explore FLIP

FLIP Navigation

Overview
Enterprise Workflow Automation Platform

Use Cases
Enterprise Use Cases Handled by FLIP

AI Workforce
Suite of Autonomous AI Agents

Security & Governance
Built for Compliance & Trust

Why FLIP
Why Choose FLIP

Pricing
Tiered Packages, Usage-based Fees

Calculate Your Migration ROI Now
Use Cases
AI-governed Reliable Data Flows & Invoice Processing

AP Automation
Eliminate manual invoice processing delays

DataOps
Automate data pipelines for faster delivery

Data Platform Migration
Migrate to modern data platforms faster

AI Invoice Processing
AI-powered invoice approvals with accuracy

Insurance Claims automation
Faster, accurate, end-to-end processing.

Trade Document Processing
Automated Trade Document Processing

Bank Statement Processing
Simplified Bank File Reconciliation

EDI Integration
Smart EDI Integration, Powered by AI

AI Agents
Autonomous AI Agents Built for You

Alan
AI legal summarizer that processes and condenses lengthy legal documents

Mike
AI quantitative proofreader that catches arithmetic errors

Susan
AI PII redactor that automatically removes sensitive information

Karl
Data insights agent that analyzes data and delivers quick insights

Ember
Automate customer service ops, resolve issues faster

AI-Powered Digital Twins for Preventive Maintenance
Register Now
Services

AI Services
Automate Decisions, Predict Outcomes, and Act Faster With Purposeful AI

Agentic AI
Deploy autonomous agents for task execution

Generative AI
Generate content and automate workflows instantly

AI Consulting
Expert AI consulting services, from strategy to deployment,

AI Strategy
Find where AI fits and build the roadmap.

Intelligent Automation
Intelligent Bots Streamline Repetitive Workflows

AI Governance
Governance That Powers Faster AI Innovation

AI Application Development
Ship production apps powered by AI.

RAG Development
Intelligent Retrieval for Smarter Decisions

AI Model Development
Build custom models for specific problems.

LLM Development
Build real products on language models.

MLOps Consulting
Keep models running reliably in production.

ML Consulting
Apply machine learning to business problems.
Data Services
Automate Decisions, Predict Outcomes, and Act Faster With Purposeful AI

Data Platform Migrations
Drive innovation and smarter decisions with AI.

Data Analytics
Unlock actionable intelligence from your data

Data Integration
Unify disparate data sources seamlessly

Data Governance
Ensure compliant, secure data management

Azure Cloud Solutions
Scale and innovate with AI-powered Azure solutions.

Predictive Analytics
Forecast demand faster and with precision

Data Engineering
Build pipelines that deliver clean data.

Data Strategy
Align data with goals worth measuring.

Data Modernization
Move off legacy platforms to cloud

Data Architecture
Design data platforms that scale.
Migration Accelerators
Automate & Accelerate Your Modernization Journeys

Azure to Microsoft Fabric
Consolidate analytics infrastructure for unified insights

Cognos to Microsoft Power BI
Transition BI tools with preserved dashboards seamlessly

Crystal Reports to Microsoft Power BI
Modernize legacy reports with advanced BI features

Alteryx to Microsoft fabric
Upgrade analytics workflows with Fabric capabilities

Informatica to Databricks
Build Lakehouse ETL pipelines for modern analytics

Informatica to Alteryx
Enable self-service analytics with automated conversion

Informatica to Microsoft fabric
Consolidate data integration into Fabric workflows

Informatica to Talend
Streamline ETL transitions with preserved business logic

SQL services to Microsoft Fabric
Modernize databases into unified analytics platform

SSRS to Microsoft Power BI
Convert server reports to interactive Power BI.

Tableau to Microsoft Power BI
Reduce costs, boost integration with Microsoft ecosystem

UiPath to Power Automate
Cut costs, boost efficiency, unlock seamless M365 integration
Technologies
Leading Platform Expertize to Enable Your Growth Goals

Microsoft Fabric
Integrate all data analytics end-to-end seamlessly

Microsoft Power BI
Visualize insights with interactive dashboards and reports

Microsoft Purview
Unified data governance, security, and compliance.

Databricks
Scale analytics on an enterprise unified Lakehouse

Snowflake
Store, query, and analyze large-scale data, all in one platform.

AI-Powered Digital Twins for Preventive Maintenance
Register Now
Industries

Industries
Industry Expertise Delivering Your Sector's Critical KPIs

Automotive
Accelerate production, optimize operations, create smarter CX.

Banking
Transform operations seamlessly with secure & compliant analytics.

Healthcare
Modernize systems, automate workflows, make faster decisions.

Insurance
Automate claims, enhance underwriting, personalize customer engagement.

Logistics & Supply Chain
Modernize operations for faster decisions, better forecasting.

Manufacturing
Boost production speed, reduce downtime, improve forecast accuracy.

Pharma
Accelerate research, improve efficiency, deliver faster.

Retail & FMCG
Digitize operations, automate tasks, deliver stronger customer connections.
AI Solutions

AI Agents
Autonomous AI Agents Built for You

Alan
AI legal summarizer that processes and condenses lengthy legal documents

Mike
AI quantitative proofreader that catches arithmetic errors

Susan
AI PII redactor that automatically removes sensitive information
AI for Enterprise
AI Solutions for Enterprise Workflows

Karl
Data insights agent that analyzes data and delivers quick insights

Ember
Automate customer service ops, resolve issues faster

DokGPT
Document intelligence agent that retrieves information instantly
AI for Business Roles
Optimize Core Business Processes for Scale with AI

Sales
Forecast revenue with AI precision

Finance
Automate reconciliation and financial reporting

Supply Chain
Optimize inventory and logistics routes

Operations
Boost efficiency through intelligent automation
AI for Industries
Industry Expertise Delivering Your Sector's Critical KPIs

AI Manufacturing
Smarter Production, Less Downtime

AI Pharma
Faster Innovation, Better Patient Outcomes

AI Insurance
Automate claims, underwriting, and policies

AI Logistics
Optimize routes, freight, and fulfillment

AI Automotive
Predictive maintenance, production, and quality

AI Healthcare
Enhanced patient and care operations

AI Banking
Faster decisions, smarter banking workflows

AI Retail
Smarter inventory, pricing, and demand

Microsoft Fabric Analyst in a Day
Register Now
Resources

Tools
Assessments & Calculators for Enterprises

AI Maturity Assessment
Evaluate your AI readiness & plan the next step

Migration ROI Calculator
Calculate your migration savings instantly
Resources
Insights Hub with Blogs, Tools, and Industry Resources.

Blogs
Stay ahead with the latest trends on Data & AI

Events & Webinars
Participate in leading events for knowledge & networking

Case studies
See proven transformation results from real client projects.

Whitepapers & Industry Reports
Step by step guidance to shape your Data & AI strategy

Infographics
Visualize complex concepts fast & clear

Videos
Demoes, case studies, thought leadership and more

Podcasts
Hear our experts dive deep to topics that matter

Datasheets
Cheat sheet to decode our solution capabilities

Knowledge Hub
Centralized learning resources

Glossaries
Master industry terminology

AI-Powered Digital Twins for Preventive Maintenance
Register Now
About

Company
Discover Our Mission and Opportunities

About us
Get to know our journey, vision, and the people behind us.

Contact us
Connect with us to discuss ideas, support needs, or partnerships.

Career
Build your career with us and grow through meaningful opportunities.

Newsroom
Discover company announcements, media mentions, and the latest updates.
Partners
Tech Partners Powering Your Digital Transformation

Enablers
Tech Enablers that Help us Power Your Digital Transformation

Microsoft
Accelerating data adoption to help organizations stay AI-ready.

Databricks
Powering Lakehouse analytics at scale for modern data-driven enterprises.

Snowflake
Simplify data modernization and accelerate analytics on Snowflake.

Microsoft Fabric Analyst in a Day
Register Now
Mobile

Call us
ROI Calculator
Contact Us
Instagram Facebook-f X-twitter Linkedin-in Youtube

+1 (855) 6-KANERI

Learn How AI-Powered Digital Twins help in Preventive Maintenance

Home Blogs Databricks Vector Search: Index Types, RAG & How It Works

Databricks Vector Search: Index Types, RAG & How It Works

TL;DR

Databricks Vector Search builds similarity search directly into the lakehouse, so an index stays automatically in sync with the source data instead of requiring a separate vector database that drifts out of date the moment a row changes. Delta Sync indexes update from a Delta table automatically, while direct-access indexes suit data outside Unity Catalog, and either one plugs straight into a production RAG pipeline without bolting governance on afterward.

Every retrieval-augmented chatbot, every “find me something similar” feature, and every recommendation that feels like it read your mind runs on the same quiet engine underneath: vector search.

The hard part was never the idea. It was keeping a separate vector database in sync with the data that actually changes in your warehouse, paying for infrastructure that drifted out of date the moment a row was updated, and bolting governance on after the fact.

Databricks vector search exists to remove that whole layer of glue work by building similarity search directly into the lakehouse where the data already lives.

This guide explains what Databricks vector search is, how an index is built and queried, the difference between Delta Sync and direct-access index types, how it compares to running a standalone vector database, and where it fits in a production retrieval-augmented generation pipeline.

One naming note before we start, because it trips people up in 2026. Databricks recently renamed this product to Databricks AI Search, but the index objects and the SDK still carry the Mosaic AI Vector Search name, and most teams still search for and say “vector search.” We use the terms interchangeably here, and the section on the rename covers what actually changed.

Watch on YouTube

Delivering Contextual Query Resolution Through an AI Support Agent

A look at a production AI agent that resolves queries against governed enterprise data, the kind of grounded retrieval a vector search index powers.

Key Takeaways

Databricks vector search is a serverless similarity search engine built into the Data Intelligence Platform, so the index lives next to your governed Delta tables instead of in a separate system.
An index is built in four stages, embed, index, sync, and query, and Databricks can compute the embeddings for you or let you bring your own.
A Delta Sync index keeps itself current with the source Delta table automatically, while a direct-access index hands you the write path for streaming or custom embedding workflows.
Hybrid search blends semantic vectors with keyword matching and reranking in one API call, and metadata filtering plus Unity Catalog governance keep results scoped and safe.
Standard endpoints give tens-of-milliseconds latency in memory, while storage-optimized endpoints serve billions of vectors cheaply at hundreds of milliseconds.
In 2026 Databricks renamed the product to Databricks AI Search, but the engine, the Mosaic AI Vector Search index objects, and the SDK are the same.
Kanerika, a Databricks partner, builds governed, cost-controlled vector search and RAG retrieval layers that production AI applications can trust.

What is Databricks vector search?

Databricks vector search is a serverless similarity search engine built into the Databricks Data Intelligence Platform. It stores vector representations of your data, called embeddings, alongside their metadata, and lets an application search by meaning rather than by exact keyword. Ask it for the rows most similar to a query, and it returns the nearest neighbors in vector space in milliseconds, governed by the same Unity Catalog permissions that protect the rest of your tables.

An embedding is a list of numbers that captures the semantic content of a piece of text, an image, or another unstructured input. As the Azure Databricks AI Search reference puts it, vector search is a type of search optimized for retrieving embeddings, the mathematical representations of semantic content.

Two passages that mean roughly the same thing land close together in that numeric space even when they share no words. That is what lets vector search answer “show me documents about cutting cloud spend” with a passage titled “reducing compute costs.”

The whole product is designed so this capability sits next to your governed data instead of in a separate system you have to feed and reconcile.

The reason it matters on Databricks specifically is the lakehouse context. Your source data already lives in Delta tables under the Databricks lakehouse architecture, often described more broadly as a data lakehouse, your access policies already live in Unity Catalog, and your transformation jobs already run in Databricks Workflows.

Putting the vector index in the same platform means the index can track the table automatically, inherit its governance, and avoid the brittle export-and-load step that breaks the moment someone changes a schema.

According to the Databricks AI Search documentation, the index integrates directly with the platform’s governance and productivity tooling rather than sitting beside it.

How vector search works on Databricks, step by step

The lifecycle from raw data to a ranked result has four stages. Understanding them is the difference between treating the product as a black box and tuning it for real workloads.

Each stage maps to a concrete object you create in the platform, and the managed path automates most of it.

Embed. A model converts each row of source text or each image into a vector. You can let Databricks compute these embeddings for you with an integrated foundation model, or supply your own precomputed embedding column if you already run an embedding model elsewhere in your generative AI tech stack.
Index. Those vectors are written into a vector search index, a specialized structure that supports fast approximate nearest-neighbor lookups instead of scanning every vector one by one.
Sync. With the managed option, the index subscribes to changes in the underlying Delta table, so inserts, updates, and deletes flow into the index without a separate job to babysit. This is the piece that removes the most operational pain.
Query. An application sends a query, Databricks embeds it with the same model, and the index returns the closest matching rows with their metadata, ready to feed an LLM or a ranking layer.

Teams that operate their own embedding models often manage them through machine learning operations practices and track versions in MLflow.

That managed sync path is the headline feature. It is what most people mean when they ask whether to put the index next to the jobs that load those tables.

The Databricks engine handles the embedding, the indexing, and the continuous refresh as one unit.

Case Study

Contextual Query Resolution With a Production AI Support Agent

Kanerika built an AI support agent that resolves user queries against governed enterprise data, the same grounded-retrieval pattern a vector search index feeds.

Read the Case Study →

Delta Sync vs direct-access index: choosing an index type

The single most important design decision is which index type to create, because it determines who owns the write path and how much Databricks automates for you. There are two, and a useful third comparison point is the standalone vector database you would otherwise run yourself.

A Delta Sync index is the managed option. You point it at a source Delta table, and Databricks keeps the index synchronized automatically as the table changes, optionally computing the embeddings on your behalf.

It is the lowest-effort path and the right default for most teams, because it eliminates the sync pipeline entirely and fits the kind of governed, self-maintaining estate that Unity Catalog is built to support.

A direct-access index, by contrast, hands you the write path. You insert, update, and delete vectors yourself through the API, which gives you precise control for streaming or custom embedding workflows but means you own the freshness logic that Delta Sync would otherwise handle.

The infographic above compares the two index types feature by feature against a self-managed database. The table below takes a different cut, mapping concrete workloads to the index type that fits, so you can match the choice to how your data actually moves.

Your workload	Recommended index	Why it fits
RAG over docs in a Delta table	Delta Sync	Auto-syncs and can embed for you, so there is no pipeline to run
Semantic search on governed tables	Delta Sync	Inherits Unity Catalog permissions with zero extra wiring
Streaming ingest with custom embeddings	Direct-access	You own the write path and control exactly what lands when
Embeddings produced outside Databricks	Direct-access	Lets you push precomputed vectors straight through the API
Source data not yet in a Delta table	Direct-access	Works without a source table to sync from, until you land one

For embedding options, the official Databricks embedding options reference details when Databricks computes vectors for you versus when you bring your own, which is the second decision that follows naturally once you have picked an index type.

Embeddings, hybrid search, and metadata filtering

Similarity alone is rarely enough for a production query. Real search blends three things:

Semantic vectors that capture meaning, so a query matches on intent rather than exact wording.
Keyword matching that nails exact terms like product codes and error numbers.
Structured filters that scope results to the right tenant, date range, or category.

Databricks vector search supports all three in a single call, which is why teams stop stitching together separate systems for each.

Hybrid search is the most useful of these. Pure vector search can miss an exact identifier because embeddings smear precise tokens into approximate neighbors, while pure keyword search misses meaning.

Hybrid search runs both, combining semantic similarity with a traditional keyword score and reranking the merged results. A query for “error 0x80070005 on login” then finds both the document that names the code and the one that describes the same failure in plain language.

According to the Databricks AI Search product page, hybrid semantic and keyword search with built-in reranking is exposed through a single API.

Metadata filtering is the quiet workhorse. Because each vector carries its source columns, you can constrain a similarity search with structured predicates, returning only rows where the region is EMEA or the document is not archived.

This is what makes vector search safe in a multi-tenant application. It is governed by the same Unity Catalog policies that protect every other query, so a user only ever retrieves vectors they are already permitted to see.

That same governance posture extends to Databricks security controls and audit, which matters once retrieval touches regulated data. That governance link is one reason vector search pairs naturally with a broader Databricks Data Intelligence Platform rollout rather than a side deployment.

Endpoint types: standard vs storage-optimized

An index has to run somewhere, and that somewhere is a vector search endpoint. Databricks offers two endpoint profiles, and the choice is a straightforward trade between latency and cost at scale. Picking the wrong one is one of the more expensive mistakes teams make, so it is worth a moment of deliberate thought rather than accepting the default.

Standard endpoints keep full-precision vectors in memory to deliver query latencies in the tens of milliseconds, which is what user-facing applications need when a person is waiting for an answer.

Storage-optimized endpoints separate storage from compute so you can serve billions of vectors at a fraction of the cost, with latencies in the hundreds of milliseconds. They suit very large corpora where cost matters more than instantaneous response.

The Databricks engineering blog on billion-scale search explains the decoupled storage design that makes the storage-optimized tier economical at that volume.

As a rule of thumb, start with a standard endpoint for any interactive RAG assistant or search box, and move to storage-optimized only when the corpus grows into the billions and your latency budget can absorb the extra hundreds of milliseconds. Mixing the two across workloads is common, the same way teams run separate compute for different jobs, and the layout decision should be backed by numbers, not guesswork.

Databricks vector search vs standalone vector databases

The honest question every architect asks is whether to use the built-in option at all, or to reach for a dedicated vector database like Pinecone, Weaviate, or pgvector. The answer turns less on raw benchmark speed and more on where your data and governance already live, which is the same logic that decides most platform questions on the lakehouse.

A standalone vector database is a specialized system you operate separately. It can be excellent and highly tunable, but it sits outside your lakehouse.

That separation has a cost. You build and own a pipeline to export embeddings into it, you reconcile its contents with the source table every time data changes, and you reimplement access control because it does not know about Unity Catalog.

Databricks vector search trades a sliver of low-level tunability for the elimination of all of that integration and governance work, because the index already lives where the data and the policies do.

The comparison below frames the decision the way it actually gets made in practice.

Kanerika Service

Databricks Consulting and Implementation

Kanerika is a Databricks partner that designs, builds, and operates lakehouse platforms end to end, from embedding strategy and index types to governed, observable retrieval pipelines.

Explore Databricks Services

Consideration	Databricks vector search	Standalone vector DB
Sync with source data	Built in via Delta Sync	You build and own the pipeline
Governance and access control	Inherited from Unity Catalog	Reimplemented separately
Infrastructure to operate	Serverless, none of your own	A separate system to run
Low-level index tuning	Abstracted and managed	Fine-grained and exposed
Best when	Your data lives on Databricks	You need a portable, standalone store

For teams already standardizing on the platform, the calculus usually favors the built-in option, in the same way the wider Databricks vs Snowflake decision tends to follow where the rest of the estate sits rather than a single feature. The same is true when comparing it against a managed alternative such as Microsoft Fabric vs Databricks, where the deciding factor is usually the existing platform rather than the search engine alone.

Building a RAG pipeline with Databricks vector search

The dominant use case is retrieval-augmented generation, where vector search supplies the grounded context that keeps a language model honest. The pattern is consistent across the production systems we see, and it maps cleanly onto the four lifecycle stages above.

You chunk your source documents into passages, embed each chunk, and write them to a Delta Sync index so the retrieval corpus stays current as documents are added or revised.

At query time, the user’s question is embedded, the index returns the most relevant chunks, and those chunks are injected into the prompt so the model answers from your actual content instead of its training data.

This is what separates a grounded assistant from a confident guess. It is the backbone of most enterprise advanced RAG systems and the newer agentic RAG designs where an agent decides what to retrieve.

The choice of retrieval tooling here matters, which is why teams compare RAG tools before committing, and why grounded retrieval increasingly powers an AI chatbot for businesses rather than a brittle scripted bot.

Vector search is also the retrieval layer underneath Databricks’ own agent tooling, including Mosaic AI and Agent Bricks, which assemble retrieval, models, and evaluation into deployable agents.

If you are weighing retrieval against other grounding strategies, our breakdown of RAG vs fine-tuning covers when to retrieve and when to retrain, the MCP vs RAG comparison covers how retrieval coexists with tool-calling protocols, and RAG vs LLM explains why retrieval beats a model’s frozen memory for fresh enterprise facts.

Watch on YouTube

Why Databricks’ Platform Wins with 2025 Data Insights

Why enterprise data teams standardize on Databricks for engineering, analytics, and AI, and what that means for building retrieval on top of it.

The 2026 rename: Vector Search to Databricks AI Search

If you are reading current documentation and getting confused, here is the cause. In 2026 Databricks rebranded the product from Databricks Vector Search to Databricks AI Search, positioning it as a broader search capability that spans semantic, keyword, and hybrid retrieval rather than vectors alone, in step with Databricks’ wider push into Databricks generative AI. The underlying engine, the index objects, and the Python SDK still use the Mosaic AI Vector Search naming, so you will see both names across the UI, the docs, and Terraform resources at the same time.

What actually changed is mostly framing plus the formalization of hybrid search and the two endpoint tiers as first-class features. The core mechanics, Delta Sync indexes, direct-access indexes, embedding options, and Unity Catalog governance, are the same ones described throughout this guide. For practical purposes, treat “Databricks vector search,” “Mosaic AI Vector Search,” and “Databricks AI Search” as the same product at different points on a naming timeline, and do not let a tutorial written under the old name send you looking for a feature that simply got renamed.

Production pitfalls to plan for

Vector search looks simple in a demo and exposes its sharp edges in production, so it pays to plan for the failure modes before they find you. None of these are reasons to avoid the product; they are the operational realities of running semantic retrieval at scale, and getting ahead of them is mostly a matter of design discipline.

Chunking. How you split documents into passages quietly determines retrieval quality more than the index settings do. Chunks that are too large dilute the signal, and chunks that are too small lose context.
Embedding model drift. If you ever change the embedding model, every vector in the index was produced by the old model and must be regenerated. Version your embedding model deliberately and treat a model swap as a full re-index.
Sync lag and cost. Delta Sync is near real-time, not instant, so plan for a short window where the index trails the source. Right-size your endpoint so you are not paying for an in-memory standard endpoint to serve an archive a storage-optimized endpoint would serve far cheaper.

These are the same discipline questions that show up in any Databricks performance optimization effort, and they reward measurement over guesswork.

How Kanerika helps you build production vector search

Standing up a vector search index in a notebook takes an afternoon. Running it as a governed, cost-controlled, production retrieval layer that an enterprise can trust is a different exercise, and that is where most teams want a partner who has done it before. Kanerika is a Databricks partner that designs and operates lakehouse platforms end to end, from Delta table layout and embedding strategy to the governed, observable retrieval pipelines that grounded AI applications depend on.

Kanerika Service

Generative AI and RAG Engineering

Kanerika designs and ships production generative AI, from retrieval-augmented pipelines and vector search to evaluated, governed enterprise assistants.

Explore Generative AI Services

We run vector search engagements as a staged path, not a single notebook task, so the retrieval layer holds up once real users and regulated data touch it.

Assess. We profile your real query logs and source tables to decide what actually belongs in the index, where retrieval will earn its keep, and which governance constraints apply before a single embedding is written.
Design. We choose index types and endpoint tiers against real query patterns, defaulting new corpora to a Delta Sync index on a standard endpoint and only graduating to storage-optimized once the corpus crosses into the billions.
Build. We stand up the chunking strategy, the embedding pipeline, and the index, pinning the embedding model version so a later swap is treated as a deliberate full re-index rather than a silent break.
Govern. We wire Unity Catalog permissions and lineage through the retrieval layer, extending the same controls with our KAN governance suite (KANGovern, KANComply, KANGuard) so retrieval never returns a row a user should not see.
Enable. We measure relevance, hand over runbooks, and set up the evaluation loop so your team can tune chunking and reranking without us in the room.

This is the same grounded-retrieval pattern behind our document intelligence agent, KlarityIQ, which is built on RAG over governed enterprise content.

For one investment-bank deployment, that retrieval layer delivered 43% faster information retrieval, a 35% reduction in manual review hours, and 100% role-based compliance, with access scoped by the same governance the index inherits.

Our broader RAG development and generative AI practice carries that pattern from a single search box to a fleet of governed agents.

The practitioner watch-outs are consistent. Right-size the endpoint early, version the embedding model deliberately, and treat chunking as a first-class design decision rather than an afterthought, because each one quietly decides whether retrieval is trusted in production.

If you want vector search built the right way the first time, we scope which data belongs in the index, set the index and endpoint configuration, and stand up the pipeline that keeps it fresh.

Frequently Asked Questions

What is Databricks vector search?

Databricks vector search is a serverless similarity search engine built into the Databricks Data Intelligence Platform. It stores vector embeddings of your data alongside their metadata and lets applications search by meaning rather than exact keywords, returning the nearest matching rows in milliseconds. Because it lives inside the lakehouse, the index can track your Delta tables automatically and inherit Unity Catalog governance, which removes the separate vector database and sync pipeline that bolt-on approaches require. In 2026 Databricks renamed the product to Databricks AI Search, but the underlying engine and the Mosaic AI Vector Search index objects are the same.

What is the difference between a Delta Sync index and a direct-access index?

A Delta Sync index is the managed option. You point it at a source Delta table and Databricks keeps the index synchronized automatically as the table changes, optionally computing the embeddings for you. A direct-access index hands you the write path, so you insert, update, and delete vectors yourself through the API and own the freshness logic. Delta Sync is the right default for most retrieval and RAG workloads because it eliminates the sync pipeline, while a direct-access index suits streaming ingestion or custom embedding workflows where you need precise control over what gets written and when.

Is Databricks vector search the same as Databricks AI Search?

Yes. In 2026 Databricks rebranded Databricks Vector Search to Databricks AI Search, positioning it as a broader search capability that spans semantic, keyword, and hybrid retrieval rather than vectors alone. The underlying engine, the index objects, and the Python SDK still carry the Mosaic AI Vector Search name, so you will see both names across the UI, the documentation, and Terraform resources at the same time. For practical purposes, treat Databricks vector search, Mosaic AI Vector Search, and Databricks AI Search as the same product at different points on a naming timeline.

How does Databricks vector search work for RAG?

For retrieval-augmented generation, you chunk your source documents into passages, embed each chunk, and write them to a Delta Sync index so the retrieval corpus stays current as documents change. At query time the user’s question is embedded with the same model, the index returns the most relevant chunks, and those chunks are injected into the prompt so the language model answers from your actual content instead of its training data. This grounding is what separates a reliable assistant from a confident guess, and vector search is the retrieval layer underneath Databricks agent tooling such as Mosaic AI and Agent Bricks.

Does Databricks vector search support hybrid search?

Yes. Hybrid search runs semantic vector similarity and traditional keyword matching together, then reranks the merged results, all exposed through a single API call. This matters because pure vector search can miss exact identifiers like product codes or error numbers, while pure keyword search misses meaning. Running both catches a query whether it names the precise token or describes the same idea in different words. You can also combine similarity with metadata filtering, scoping results by structured predicates such as region or status, all governed by the same Unity Catalog permissions as the rest of your data.

What is the difference between standard and storage-optimized endpoints?

Standard endpoints keep full-precision vectors in memory to deliver query latencies in the tens of milliseconds, which is what interactive, user-facing applications need. Storage-optimized endpoints separate storage from compute so you can serve billions of vectors at a fraction of the cost, with latencies in the hundreds of milliseconds, which suits very large corpora where cost matters more than instantaneous response. As a rule of thumb, start with a standard endpoint for any interactive RAG assistant or search box, and move to storage-optimized only when the corpus grows into the billions and your latency budget can absorb the extra delay.

Should I use Databricks vector search or a standalone vector database?

It depends on where your data and governance already live. A standalone vector database like Pinecone, Weaviate, or pgvector can be excellent and highly tunable, but it sits outside your lakehouse, so you build a pipeline to export embeddings into it, reconcile it with the source on every change, and reimplement access control. Databricks vector search trades a sliver of low-level tunability for eliminating all of that integration and governance work, because the index already lives where the data and the Unity Catalog policies do. For teams already standardizing on Databricks, the built-in option usually wins.

Can Databricks compute embeddings for me?

Yes. With a Delta Sync index you can let Databricks generate embeddings using an integrated foundation model, so you only point the index at a text column and the platform handles vectorization and refresh. Alternatively, you can supply your own precomputed embedding column if you already run an embedding model elsewhere in your stack. If you ever change the embedding model, remember that every vector in the index was produced by the old model and must be regenerated, so version your embedding model deliberately and treat a model swap as a full re-index.

Authored by

Gaurav Verma | Chief Marketing Officer

Gaurav Verma brings 25+ years of B2B SaaS marketing expertise, helping brands sharpen positioning, build demand, and drive measurable growth in competitive markets.

View Profile ⇒

Reviewed by

Shaurya Chauhan | Lead Software Engineer

Databricks Certified Data Engineer Professional and Lead Software Engineer at Kanerika, specializing in data engineering and analytics across Azure, Microsoft Fabric, Databricks, and Snowflake.

View Profile ⇒