Most enterprise AI agent projects take six to nine months to reach production, and a third never make it. The bottleneck is rarely the model. It’s the evaluation frameworks, the optimization loops, and the governance wiring that gets retrofitted after the fact. By the time an agent is ready, the business case has shifted.
Databricks Agent Bricks was built for exactly this problem. AstraZeneca used it to process over 400,000 clinical trial documents and had a working agent in under 60 minutes, without writing a single line of code. It automates the parts of agent development that eat the most time, including evaluation, optimization, and governed deployment, entirely within the Databricks environment, on data that’s already there.
In this article, we’ll cover what Agent Bricks is, how it works, which use cases it fits, how it compares to custom frameworks, its current limitations, and how enterprises are using it today.
Key Takeaways
- Agent Bricks is Databricks’ managed enterprise agent platform, now generally available for Document Intelligence and Custom Agents
- It auto-generates task-specific evaluations and optimizes agent quality using Mosaic AI research, including a technique called Agent Learning from Human Feedback (ALHF)
- Supported agent types include Information Extraction, Knowledge Assistant, AI/BI Genie, Supervisor Agent, and Custom Agents
- Governance runs through Unity Catalog, which means data access controls, lineage, and auditing apply to agent interactions too
- It trades deep customization for speed to production, which makes it a strong fit for teams that want working agents in days, not months
What Databricks Agent Bricks Does for Enterprise AI Teams
Agent Bricks is Databricks’ enterprise platform for building, deploying, and governing AI agents that operate on business data. It was introduced in Beta at the Data + AI Summit in June 2025, with Custom Agents reaching general availability in February 2026 and Document Intelligence going GA in April 2026.
The core idea is to remove the manual complexity from AI agent development. The workflow follows four named stages, each handled automatically once a team provides the task description and data source.
- Declare: Define the task in natural language and point to a data source
- Evaluate: Agent Bricks builds domain-specific benchmarks and generates synthetic data automatically
- Optimize: Tests combinations of techniques to find the best-performing configuration
- Deploy: Selected agent ships as a secure serverless endpoint, queryable directly from SQL using ai_query()
This is a meaningful departure from how most teams have been building agents. Instead of manually tuning prompts, selecting models, and hoping a gut check is enough to judge quality, teams get an automated optimization loop backed by Databricks Mosaic AI research.
The Five Agent Types in Databricks Agent Bricks
Agent Bricks groups agents into five types, with each type built for a different enterprise task. Four work as managed agents out of the box. The fifth, Custom Agents, gives teams room to handle use cases that the managed types do not cover. The right choice upfront helps teams move to production faster, so they should understand what each type does before changing any settings.
The table below maps each type to its function and the workflows it fits best.
| Agent Type | What It Does | Best Fit For |
|---|---|---|
| Information Extraction | Converts unstructured documents into structured JSON fields | Contracts, invoices, clinical trial docs, emails |
| Knowledge Assistant | Turns document collections into a queryable Q&A system using RAG | Internal policy docs, technical manuals, HR knowledge bases |
| AI/BI Genie | Answers structured data questions in natural language using semantic layer | Business analysts querying Delta tables |
| Supervisor Agent | Orchestrates multiple sub-agents to complete multi-step tasks | Supply chain workflows, cross-domain automation |
| Custom Agents | Build and deploy agents using LangChain, LangGraph, or custom Python | Specialized workflows requiring deeper control |
The Supervisor Agent has become the most widely used type on the platform. According to Databricks’ own usage analysis, it accounts for 37% of enterprise Agent Bricks deployments, reflecting how many real-world workflows span multiple systems and domains. Teams building multi-agent workflows will find the Supervisor Agent the most relevant starting point.
Not Sure Which Agent Type Fits Your Use Case?
Picking the wrong type is the most common reason deployments stall, talk to someone who’s seen it before.
How the Evaluation and Optimization Loop Works
The part that separates Agent Bricks from most DIY approaches is its evaluation pipeline. Getting an agent into production usually stalls at the evaluation stage, because measuring whether a reasoning system is actually good is harder than measuring whether a classification model is accurate.
Databricks addresses this with several components working together. Agent-as-a-Judge and Tunable Judges let teams assess agent output quality programmatically, while Judge Builder allows teams to create custom evaluation criteria for tasks that don’t map neatly to standard metrics. Databricks has now open-sourced these in MLflow 3, so teams can apply the evaluation framework to agents running outside Databricks too.
How Human Feedback Feeds Back Into the Loop
One of the more technically interesting innovations inside the optimization layer is Agent Learning from Human Feedback (ALHF). Most systems collect thumbs-up or thumbs-down feedback without a clear mechanism for translating that into agent improvements. ALHF links feedback signals to specific parts of the agent system, so teams can make focused adjustments instead of random changes. For tasks where automated scoring isn’t enough, subject matter experts can review outputs and provide corrective feedback in plain language through the MLflow Review App, which feeds directly back into the optimization loop.
| Evaluation Component | Role | Where It Fits |
|---|---|---|
| Agent-as-a-Judge | Automated output quality scoring | Automatic evaluation phase |
| Tunable Judges | Adjustable scoring criteria for custom tasks | When standard metrics don’t fit the task |
| Judge Builder | Create task-specific evaluation logic | Specialized or domain-specific agent tasks |
| ALHF | Connects human feedback to targeted component fixes | Continuous quality improvement post-deployment |
| MLflow 3 Integration | Monitoring and observability for all agents | Production tracking, even outside Databricks |
Each component addresses a different point in the quality problem. The combination means teams can assess agent output before launch and keep improving it after deployment based on real usage signals.
Early results from customers reflect this. AstraZeneca’s team processed over 400,000 clinical trial documents and extracted structured data points in under 60 minutes without writing code. Hawaiian Electric reported that Agent Bricks outperformed their previous open-source implementation across both automated and human evaluation metrics.
How Agent Bricks Handles Document Intelligence
One of the more practical additions to Agent Bricks is how it handles unstructured document data. Databricks’ own estimate is that 80% of enterprise knowledge sits in PDFs, slide decks, contracts, and scanned documents that traditional analytics systems can’t use.
The new ai_parse_document() function, governed through Unity Catalog, allows teams to extract structured data from layout-heavy documents using plain SQL. No external libraries, no third-party OCR APIs, no Python glue code. The system extracts tables, merged cells, diagrams, and captions, then stores them directly in Delta tables so teams can query and reuse formerly inaccessible content within the same lakehouse environment.
This matters most for industries where document volumes are high and extraction accuracy has direct compliance or financial consequences, including financial services, healthcare, insurance, and logistics.
Agent Bricks vs. Custom Agent Frameworks
The honest question for any enterprise team already using Databricks is whether Agent Bricks replaces LangChain, LangGraph, LlamaIndex, or custom Python implementations. The short answer is that it depends on what the team is optimizing for.
Agent Bricks operates at a higher level of abstraction. It is a managed, factory-tuned system that automates evaluation, optimization, and governed deployment. Custom frameworks give teams full control over every component, including the orchestration logic, tool selection, memory design, and model routing, but that control comes with the full cost of building, testing, and maintaining each piece.
The table below summarizes the trade-off.
| Dimension | Agent Bricks | Custom Frameworks (LangChain, LangGraph, etc.) |
|---|---|---|
| Speed to first working agent | Hours to days | Days to weeks |
| Customization depth | Constrained to managed agent types | Full flexibility |
| Evaluation | Automated, built-in | Manual, team-owned |
| Governance | Native Unity Catalog integration | Requires separate setup |
| Maintenance overhead | Managed by Databricks | Team-owned |
| Cost optimization | Auto-optimized | Manual model selection |
| Transparency | Less visibility into optimization decisions | Full visibility |
Where Agent Bricks Fits in the Broader AI Stack
Databricks recommends a “start simple, then specialize” path. AI Functions handle quick SQL-based tasks.
For production-scale agents, Agent Bricks takes the lead. Agent Framework fits workflows that go beyond what managed agents can handle.
Agent Bricks also addresses agent sprawl, a problem most content skips over. As enterprises build more agents, users end up with dozens of disconnected bots and no single place to find them. Supervisor Agent fixes this by giving users one entry point that reasons about intent and routes to the right specialized agent automatically.
For Azure-centric teams, the relevant comparison is Agent Bricks versus Azure AI Foundry. Foundry suits teams embedded in the Microsoft ecosystem needing broad Azure service integration. Agent Bricks suits teams whose data already lives in Databricks and who prioritize governance depth and evaluation quality on that data. In practice, some organizations run both.
How Databricks Unity Catalog Governance Works with Agent Bricks
Governance is where Agent Bricks’ Databricks-native positioning becomes most relevant for enterprise buyers. Most agent platforms govern the agent itself, defining what tools it can call and what permissions it holds. Agent Bricks extends governance to everything the agent interacts with, through Unity Catalog.
Identity is enforced end to end. Agents inherit user identity through on-behalf-of token passing, so they can only access what the requesting user is authorized to see. Model access runs through AI Gateway, which handles
- Multi-model routing across providers
- Rate limits per user or team
- Automatic fallbacks when a provider goes down
- Organization-wide policies for prompt injection prevention and sensitive data filtering
Every interaction is logged and auditable in a single control plane.
Agent Bricks also natively supports the Model Context Protocol (MCP), the emerging standard for tool integration. Agents get governed access to external APIs, SaaS tools, and databases through Unity Catalog. MCP tools are discoverable across the organization via a built-in catalog, removing the need to build one-off integrations for every external connection.
This matters at scale. When 63% of enterprise customers route tasks across multiple model families, as Databricks reports, having centralized governance over model access, tool permissions, and data lineage in one place removes a substantial coordination overhead.
3 Practical Limitations Worth Knowing Before You Build
Agent Bricks is a maturing platform, and teams planning production deployments should go in with accurate expectations.
- Platform prerequisites
- Unity Catalog and Serverless Compute must both be enabled
- Files larger than 50MB are not indexed
- Workspaces with Enhanced Security and Compliance, or Azure Private Link, are not supported
- Agent Bricks is not available in Databricks’ Free Edition
- Agent-level constraints
- Supervisor Agent systems are capped at 20 agents
- Regional availability is primarily US workspaces. European AWS availability is in Public Preview
- Foundation models must be accessible through the system.ai schema in Unity Catalog
- Trade-offs to factor in
- Model selection and prompt configuration decisions are not always visible to the development team. Teams needing fine-grained control should review current documentation before committing
- Automated optimization works best with representative data. Teams with thin datasets get less out of the cycle
- For hybrid environments spanning Databricks and other platforms like Azure AI Foundry, AWS Bedrock, or Google Vertex, deciding which agents live where is an architectural call worth making early
Teams that confirm these prerequisites early avoid the most common deployment delays.
Enterprise Use Cases Where Agent Bricks Is Running in Production
Thousands of organizations across financial services, retail, healthcare, and technology have deployed agents on Agent Bricks in production, according to Databricks. Here is how different industries are putting it to work.
- Financial services – Information Extraction agents process loan applications, vendor contracts, and regulatory filings
- Healthcare – Document intelligence handles clinical trial data, patient records, and lab reports
- Supply chain – Supervisor Agents orchestrate workflows across procurement, inventory, and logistics systems
- Retail and e-commerce – Knowledge Assistants serve as internal product and policy databases for customer-facing teams
The agent types most likely to generate early ROI are those tied to high-volume, document-heavy processes where accuracy directly affects compliance, cost, or customer experience. Teams running on Databricks Lakehouse Architecture will find Agent Bricks a natural extension of their existing investment rather than a new platform decision.
See Your Industry in Here? Let’s talk.
Tell us your use case, and we’ll map it to the right Agent Bricks starting point, no lengthy scoping required.
How Kanerika Helps Enterprises Deploy Agent Bricks
Kanerika is a registered Databricks Consulting Partner with experience implementing Databricks environments across financial services, manufacturing, retail, logistics, and healthcare. The firm has delivered AI and data projects for clients including Volkswagen, HDFC, Dr. Reddy’s Laboratories, and Trax Technologies.
For teams evaluating Agent Bricks, the deployment questions that typically arise go beyond the platform itself.
- Which agent type fits the use case?
- Does the existing Unity Catalog setup support the governance model Agent Bricks requires?
- Is Serverless Compute configured and budgeted?
- What does evaluation look like against the team’s specific data?
- If the organization runs a hybrid AI environment, which workloads belong in Agent Bricks and which don’t?
Kanerika’s Databricks work covers the full implementation lifecycle, from architecture design and Unity Catalog governance setup to agent deployment, evaluation design, and ongoing optimization. The team’s experience migrating complex Informatica ETL workloads into Databricks and working with clients on AI agent orchestration and agentic AI deployments means they’ve seen the integration challenges that don’t show up in product documentation.
If your team is evaluating Agent Bricks or already building on Databricks, talk to a Kanerika consultant who has navigated the same decisions before.
Still Figuring Out Where Agent Bricks Fits in Your Stack?
Kanerika’s Databricks team has worked through these decisions across industries. Reach out and we’ll help you cut through the noise.
Wrapping Up
Databricks Agent Bricks addresses a real gap in enterprise AI development. Building agents is manageable. Getting them to a production standard, with consistent evaluation, cost controls, and governance, has been the hard part. Agent Bricks automates the evaluation and optimization loop, integrates governance through Unity Catalog, and gives enterprise teams a structured path from idea to deployed agent without requiring a dedicated AI engineering team to manage every component. For organizations already running on Databricks, it’s worth evaluating as the first option before building custom agent infrastructure from scratch.
FAQs
What is Databricks Agent Bricks?
Databricks Agent Bricks is an enterprise platform for building, deploying, and governing AI agents on business data within the Databricks environment. Teams define a task, connect a data source, and Agent Bricks automatically generates evaluation benchmarks, creates synthetic training data, and optimizes the agent across quality and cost dimensions using Mosaic AI research techniques.
How is Agent Bricks different from LangChain or LangGraph?
LangChain and LangGraph are open frameworks that give developers full control over how agents are constructed. Agent Bricks is a managed platform that trades some of that flexibility for significantly faster deployment, built-in evaluation, automated optimization, and native Unity Catalog governance. Teams that need highly customized agent logic often use both, with Agent Framework for specialized components and Agent Bricks for production-scale deployment.
What are the Agent Bricks use cases?
Agent Bricks is optimized for five agent types:
Information Extraction – Turns documents into structured data
Knowledge Assistant – Builds Q&A systems from document collections
AI/BI Genie – Answers structured data questions in natural language
Supervisor Agent – Orchestrates multi-agent workflows across specialized domains
Custom Agents – Supports deployments built with LangChain, LangGraph, or Python
Does Agent Bricks require Unity Catalog?
Yes. Unity Catalog is a prerequisite for Agent Bricks. It serves as the governance layer for all agent interactions, enforcing role-based access controls, auditing tool calls and model usage, and managing data lineage from agent outputs back to source data. Organizations without Unity Catalog configured will need to set it up before Agent Bricks can be deployed.
What are the current limitations of Databricks Agent Bricks?
Current known constraints:
Files larger than 50MB are not indexed
Serverless Compute must be enabled
Workspaces using Enhanced Security and Compliance, or Azure Private Link, are not supported
Supervisor Agent systems are capped at 20 agents
Regional availability is still limited. European AWS availability is in Public Preview
The managed optimization process offers less visibility into model selection and configuration decisions compared to custom frameworks
Which industries are using Agent Bricks in production?
Financial services, healthcare, retail, technology, and manufacturing teams are running agents in production on the platform. Workday, Virgin Atlantic, Zapier, EchoStar, and AstraZeneca are among the named customers Databricks has referenced. AstraZeneca’s team processed over 400,000 clinical trial documents using Agent Bricks without writing code.
How does Agent Bricks handle evaluation?
Agent Bricks automatically generates task-specific evaluation benchmarks when a team declares an agent task. The evaluation stack includes four components.
Agent-as-a-Judge – Automated output quality scoring
Tunable Judges – Adjustable criteria for domain-specific tasks
Judge Builder – Custom evaluation logic for tasks that don’t fit standard metrics
ALHF (Agent Learning from Human Feedback) – Connects thumbs-up/down signals to targeted component fixes
These evaluation capabilities are open-sourced into MLflow 3, so they work on agents running outside Databricks too.
Do teams need data science or ML engineering experience to use Agent Bricks?
For the managed agent types, Information Extraction, Knowledge Assistant, and AI/BI Genie, teams with data engineering or analytics backgrounds can deploy working agents without deep ML expertise. The platform is designed so that defining the task and pointing to data is enough to get started. Custom Agents and more complex Supervisor configurations will benefit from engineers familiar with agent design patterns and the Databricks platform. A Databricks consulting partner can accelerate the setup and architecture decisions for teams new to the platform.



