Home
Products

Intelligent Workflow Automation Platform
Explore FLIP

FLIP Navigation

Overview
Enterprise Workflow Automation Platform

Use Cases
Enterprise Use Cases Handled by FLIP

AI Workforce
Suite of Autonomous AI Agents

Security & Governance
Built for Compliance & Trust

Why FLIP
Why Choose FLIP

Pricing
Tiered Packages, Usage-based Fees

Calculate Your Migration ROI Now
Use Cases
AI-governed Reliable Data Flows & Invoice Processing

AP Automation
Eliminate manual invoice processing delays

DataOps
Automate data pipelines for faster delivery

Data Platform Migration
Migrate to modern data platforms faster

AI Invoice Processing
AI-powered invoice approvals with accuracy

Insurance Claims automation
Faster, accurate, end-to-end processing.

Trade Document Processing
Automated Trade Document Processing

Bank Statement Processing
Simplified Bank File Reconciliation

EDI Integration
Smart EDI Integration, Powered by AI

AI Agents
Autonomous AI Agents Built for You

Alan
AI legal summarizer that processes and condenses lengthy legal documents

Mike
AI quantitative proofreader that catches arithmetic errors

Susan
AI PII redactor that automatically removes sensitive information

Karl
Data insights agent that analyzes data and delivers quick insights

Ember
Automate customer service ops, resolve issues faster

AI-Powered Digital Twins for Preventive Maintenance
Register Now
Services

AI Services
Automate Decisions, Predict Outcomes, and Act Faster With Purposeful AI

Agentic AI
Deploy autonomous agents for task execution

Generative AI
Generate content and automate workflows instantly

AI Consulting
Expert AI consulting services, from strategy to deployment,

AI Strategy
Find where AI fits and build the roadmap.

Intelligent Automation
Intelligent Bots Streamline Repetitive Workflows

AI Governance
Governance That Powers Faster AI Innovation

AI Application Development
Ship production apps powered by AI.

RAG Development
Intelligent Retrieval for Smarter Decisions

AI Model Development
Build custom models for specific problems.

LLM Development
Build real products on language models.

MLOps Consulting
Keep models running reliably in production.

ML Consulting
Apply machine learning to business problems.
Data Services
Automate Decisions, Predict Outcomes, and Act Faster With Purposeful AI

Data Platform Migrations
Drive innovation and smarter decisions with AI.

Data Analytics
Unlock actionable intelligence from your data

Data Integration
Unify disparate data sources seamlessly

Data Governance
Ensure compliant, secure data management

Azure Cloud Solutions
Scale and innovate with AI-powered Azure solutions.

Predictive Analytics
Forecast demand faster and with precision

Data Engineering
Build pipelines that deliver clean data.

Data Strategy
Align data with goals worth measuring.

Data Modernization
Move off legacy platforms to cloud

Data Architecture
Design data platforms that scale.
Migration Accelerators
Automate & Accelerate Your Modernization Journeys

Azure to Microsoft Fabric
Consolidate analytics infrastructure for unified insights

Cognos to Microsoft Power BI
Transition BI tools with preserved dashboards seamlessly

Crystal Reports to Microsoft Power BI
Modernize legacy reports with advanced BI features

Alteryx to Microsoft fabric
Upgrade analytics workflows with Fabric capabilities

Informatica to Databricks
Build Lakehouse ETL pipelines for modern analytics

Informatica to Alteryx
Enable self-service analytics with automated conversion

Informatica to Microsoft fabric
Consolidate data integration into Fabric workflows

Informatica to Talend
Streamline ETL transitions with preserved business logic

SQL services to Microsoft Fabric
Modernize databases into unified analytics platform

SSRS to Microsoft Power BI
Convert server reports to interactive Power BI.

Tableau to Microsoft Power BI
Reduce costs, boost integration with Microsoft ecosystem

UiPath to Power Automate
Cut costs, boost efficiency, unlock seamless M365 integration
Technologies
Leading Platform Expertize to Enable Your Growth Goals

Microsoft Fabric
Integrate all data analytics end-to-end seamlessly

Microsoft Power BI
Visualize insights with interactive dashboards and reports

Microsoft Purview
Unified data governance, security, and compliance.

Databricks
Scale analytics on an enterprise unified Lakehouse

Snowflake
Store, query, and analyze large-scale data, all in one platform.

AI-Powered Digital Twins for Preventive Maintenance
Register Now
Industries

Industries
Industry Expertise Delivering Your Sector's Critical KPIs

Automotive
Accelerate production, optimize operations, create smarter CX.

Banking
Transform operations seamlessly with secure & compliant analytics.

Healthcare
Modernize systems, automate workflows, make faster decisions.

Insurance
Automate claims, enhance underwriting, personalize customer engagement.

Logistics & Supply Chain
Modernize operations for faster decisions, better forecasting.

Manufacturing
Boost production speed, reduce downtime, improve forecast accuracy.

Pharma
Accelerate research, improve efficiency, deliver faster.

Retail & FMCG
Digitize operations, automate tasks, deliver stronger customer connections.
AI Solutions

AI Agents
Autonomous AI Agents Built for You

Alan
AI legal summarizer that processes and condenses lengthy legal documents

Mike
AI quantitative proofreader that catches arithmetic errors

Susan
AI PII redactor that automatically removes sensitive information
AI for Enterprise
AI Solutions for Enterprise Workflows

Karl
Data insights agent that analyzes data and delivers quick insights

Ember
Automate customer service ops, resolve issues faster

DokGPT
Document intelligence agent that retrieves information instantly
AI for Business Roles
Optimize Core Business Processes for Scale with AI

Sales
Forecast revenue with AI precision

Finance
Automate reconciliation and financial reporting

Supply Chain
Optimize inventory and logistics routes

Operations
Boost efficiency through intelligent automation
AI for Industries
Industry Expertise Delivering Your Sector's Critical KPIs

AI Manufacturing
Smarter Production, Less Downtime

AI Pharma
Faster Innovation, Better Patient Outcomes

AI Insurance
Automate claims, underwriting, and policies

AI Logistics
Optimize routes, freight, and fulfillment

AI Automotive
Predictive maintenance, production, and quality

AI Healthcare
Enhanced patient and care operations

AI Banking
Faster decisions, smarter banking workflows

AI Retail
Smarter inventory, pricing, and demand

Microsoft Fabric Analyst in a Day
Register Now
Resources

Tools
Assessments & Calculators for Enterprises

AI Maturity Assessment
Evaluate your AI readiness & plan the next step

Migration ROI Calculator
Calculate your migration savings instantly
Resources
Insights Hub with Blogs, Tools, and Industry Resources.

Blogs
Stay ahead with the latest trends on Data & AI

Events & Webinars
Participate in leading events for knowledge & networking

Case studies
See proven transformation results from real client projects.

Whitepapers & Industry Reports
Step by step guidance to shape your Data & AI strategy

Infographics
Visualize complex concepts fast & clear

Videos
Demoes, case studies, thought leadership and more

Podcasts
Hear our experts dive deep to topics that matter

Datasheets
Cheat sheet to decode our solution capabilities

Knowledge Hub
Centralized learning resources

Glossaries
Master industry terminology

AI-Powered Digital Twins for Preventive Maintenance
Register Now
About

Company
Discover Our Mission and Opportunities

About us
Get to know our journey, vision, and the people behind us.

Contact us
Connect with us to discuss ideas, support needs, or partnerships.

Career
Build your career with us and grow through meaningful opportunities.

Newsroom
Discover company announcements, media mentions, and the latest updates.
Partners
Tech Partners Powering Your Digital Transformation

Enablers
Tech Enablers that Help us Power Your Digital Transformation

Microsoft
Accelerating data adoption to help organizations stay AI-ready.

Databricks
Powering Lakehouse analytics at scale for modern data-driven enterprises.

Snowflake
Simplify data modernization and accelerate analytics on Snowflake.

Microsoft Fabric Analyst in a Day
Register Now
Mobile

Call us
ROI Calculator
Contact Us
Instagram Facebook-f X-twitter Linkedin-in Youtube

+1 (855) 6-KANERI

Learn How AI-Powered Digital Twins help in Preventive Maintenance

Home Blogs Databricks Asset Bundles Complete Guide to Deployment and CI/CD

Databricks Asset Bundles Complete Guide to Deployment and CI/CD

Most Databricks teams deploy jobs the same way they did three years ago. Someone clicks through the UI, exports a JSON config, and emails it to the next person. When something breaks in production, nobody knows what changed, who changed it, or when. The process works until the team grows, and then it stops working entirely.

Databricks Asset Bundles, officially renamed Declarative Automation Bundles in March 2026, put the entire project into code. Jobs, pipelines, clusters, permissions, and environment configs live in YAML files in Git and deploy with a single CLI command. The rename is backward-compatible and the DAB abbreviation is still in common use.

In this article, we cover what bundles are, how to set one up, how to connect them to CI/CD, where Terraform fits, and the mistakes enterprise teams consistently make when they skip the governance layer.

TL;DR

Databricks Asset Bundles, officially renamed Declarative Automation Bundles in March 2026, put an entire Databricks project into version-controlled code: jobs, pipelines, clusters, permissions, and environment configs live in YAML files in Git and deploy with a single CLI command. A new direct deployment engine in CLI v0.279+ removes the previous Terraform dependency, eliminating state drift and speeding up deploys. Bundles are project-scoped, covering jobs, pipelines, and dashboards, while Terraform stays platform-scoped, covering workspaces, Unity Catalog, and networking, so most enterprise teams run both together. The most common failure mode is deploying from a laptop without CI/CD gates; the fix is routing every staging and production deploy through GitHub Actions or Azure DevOps with bundle validate as a mandatory first step.

Key Takeaways

Databricks Asset Bundles, officially renamed Declarative Automation Bundles in March 2026, let teams define their entire Databricks project as YAML-based code and deploy it with a single CLI command
The core bundle structure includes a databricks.yml file, resource definitions, environment targets, and custom variables, all version-controlled in Git
A new direct deployment engine (CLI v0.279+) removes the previous Terraform dependency, making deployments faster and eliminating state drift issues
CI/CD integration with GitHub Actions or Azure DevOps is the right way to govern bundle deployments. Manual local deploys to production are the most common source of environment drift
DABs and Terraform are complementary: use DABs for project-level resources including jobs, pipelines, and dashboards, and Terraform for platform-level infrastructure including workspaces, Unity Catalog, and cloud networking
Enterprise teams that scale DABs well use user-scoped targets for developer isolation, monorepo structures for multi-domain platforms, and custom templates to enforce organizational standards

Modernize Your Databricks DevOps Strategy.

Partner with Kanerika to Automate Deployments and Improve Collaboration with Databricks Asset Bundles.

Book a Meeting

What Are Databricks Asset Bundles?

Databricks Asset Bundles are an infrastructure-as-code approach to managing Databricks projects. Instead of creating jobs through the UI, maintaining JSON export files, or relying on custom CLI scripts, teams define everything in YAML files that live alongside source code in Git. Jobs, pipelines, clusters, permissions, and environment configurations all become version-controlled artifacts.

In March 2026, Databricks renamed the feature from Databricks Asset Bundles to Declarative Automation Bundles with CLI v0.287+. The rename is fully backward-compatible. All existing configurations, CLI commands, and file names stay exactly the same, and the DAB abbreviation is still in common use.

1. The Concept Behind Declarative Infrastructure

If a resource exists in Databricks, it should exist as a file in the repository. When a data engineer creates a job via the Databricks UI, that job lives only in the workspace. Two developers trying to modify it simultaneously overwrite each other’s work, there is no review process, and reproducing the exact same config in another environment requires doing it manually from scratch.

Bundles solve this by making the YAML file the source of truth. Run databricks bundle deploy and Databricks reads that file, computes what needs to change, and updates the workspace to match. The workspace becomes an output of code, not a place where configuration lives independently. Every change ships through Git, so every deploy has a commit hash, a reviewer, and an audit trail.

2. What Changed With the Declarative Automation Bundles Rename

The rename was substantive. When bundles first launched, assets referred narrowly to notebooks and jobs. Declarative automation more accurately reflects the tool’s current scope. Bundles today manage dashboards, Unity Catalog schemas, model serving endpoints, MLflow experiments, SQL alerts, and Lakebase Postgres projects, all as versioned YAML resources.

A separate change arrived with CLI v0.279+ in December 2025. Databricks introduced the direct deployment engine. Previously, DABs used Terraform under the hood to manage state. The new engine removes that dependency entirely. Teams that migrate with databricks bundle migrate can drop Terraform state files and version compatibility management entirely.

Inside a Databricks Bundle: Core Components

A bundle is a directory. It contains source code files, YAML configuration files that describe Databricks resources, and the structure that ties them together for deployment. Understanding these parts before writing any YAML saves considerable debugging time later.

1. The databricks.yml File

Every bundle has exactly one top-level databricks.yml file. This is the entry point the Databricks CLI reads. It declares the bundle name, optionally pulls in other YAML files, defines variables, and sets up deployment targets.

A minimal databricks.yml looks like this:

yaml

bundle:
  name: customer-etl

include:
  - resources/*.yml

variables:
  env:
    description: Deployment environment
    default: dev

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://your-dev-workspace.azuredatabricks.net
  prod:
    mode: production
    workspace:
      host: https://your-prod-workspace.azuredatabricks.net

The include directive pulls in resource definition files from subdirectories, keeping the root file readable. The targets block defines each environment with its own workspace URL and deployment mode.

2. Resources, Targets, and Variables

Resource files define the Databricks objects the bundle creates and manages. A typical job definition looks like this:

yaml

resources:
  jobs:
    customer_ingestion:
      name: Customer Ingestion Job
      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: ./notebooks/ingest.py
          existing_cluster_id: ${var.cluster_id}
      schedule:
        quartz_cron_expression: "0 0 6 * * ?"
        timezone_id: UTC

Variables handle the values that differ across environments. A variable defined in databricks.yml is referenced anywhere in the bundle with ${var.variable_name}. Target-level overrides let dev and prod resolve to different cluster IDs, catalog names, or schedule frequencies, removing the need to duplicate entire resource files per environment. Keeping variables clean from the start is far easier than refactoring them out of hardcoded resource files after a team has grown to five engineers with three active environments.

3. How the Direct Deployment Engine Changed the Architecture

Before CLI v0.279+, DABs ran Terraform behind the scenes. Terraform tracked the bundle’s deployed state in a state file stored in the Databricks workspace. That approach worked, but Terraform version mismatches broke deployments, state drift caused unpredictable behavior, and the extra abstraction layer added latency.

The direct deployment engine removes all of that. Databricks now tracks deployed resources natively in the workspace, with no separate state file. To migrate an existing bundle, run databricks bundle migrate from the bundle directory. The official migration guide walks through the full process.

How to Set Up Your First Databricks Asset Bundle

Setting up a bundle from scratch takes under 30 minutes with the Databricks CLI already configured. The steps below apply to both Azure Databricks and Databricks on AWS.

1. Install the Databricks CLI (v0.287+)

DABs require Databricks CLI v0.218.0 or above. The current stable release as of mid-2026 is v0.287+. Install via Homebrew on macOS or via the install script on Linux:

bash

# macOS
brew tap databricks/tap && brew install databricks

# Linux
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

After installing, configure authentication. Databricks recommends OAuth user-to-machine (U2M) authentication:

bash

databricks auth login --host https://your-workspace.azuredatabricks.net

Verify the version with databricks -v before proceeding. Running an older CLI against a bundle that uses direct-engine features produces confusing validation errors that are hard to diagnose without knowing the version constraint.

2. Initialize a Bundle With a Template

The CLI ships with default templates for common project types. Run databricks bundle init and choose from default-python, default-sql, or dbt. For teams that want to define the structure themselves, the default-minimal template available since CLI v0.277+ gives a databricks.yml with catalog variables and nothing else.

bash

databricks bundle init

The init command asks for a project name, workspace URL, and target environments, then generates the initial directory structure. The output is a working bundle that can be validated immediately with no additional configuration.

3. Configure databricks.yml for Dev and Prod

After initialization, update the workspace host URLs for each target. Set mode: development for the dev target and mode: production for prod. Development mode automatically prefixes resource names with the deploying user’s username, so personal dev resources stay isolated from shared environments.

Add variables for any value that differs between environments. Cluster IDs, catalog names, and notification email addresses are the most common candidates. Hardcoding these values directly in resource files is the single most common bundle configuration mistake at scale, covered in detail in the mistakes section below.

4. Validate, Deploy, and Run

With configuration in place, the full bundle lifecycle runs in three commands:

bash

# Validate configuration without deploying
databricks bundle validate -t dev

# Deploy resources to the dev workspace
databricks bundle deploy -t dev

# Run a specific workflow in the deployed bundle
databricks bundle run -t dev customer_ingestion

bundle validate catches YAML syntax errors, missing variable references, and resource configuration issues before anything touches the workspace. Running it as the first step in every CI/CD pipeline is a hard requirement.

How to Build a CI/CD Pipeline With Databricks Asset Bundles

Bundles give teams a single command to deploy. CI/CD gives that command governance. Without automated pipelines, a developer can run databricks bundle deploy --target prod from a laptop at any time, using a local configuration that may have diverged from what is in Git. That divergence is invisible until something breaks.

1. GitHub Actions Workflow for Bundle Deployment

A standard GitHub Actions workflow runs on merge to main, deploys to staging, and then requires a version tag to promote to production:

yaml

name: Deploy Databricks Bundle

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Databricks CLI
        run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

      - name: Deploy to staging
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_STAGING_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_STAGING_TOKEN }}
        run: |
          databricks bundle validate -t staging
          databricks bundle deploy -t staging

      - name: Deploy to prod (on tag)
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          DATABRICKS_HOST: ${{ secrets.DATABRICKS_PROD_HOST }}
          DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_PROD_TOKEN }}
        run: databricks bundle deploy -t prod

Store workspace credentials as GitHub secrets, never inline in the workflow file. For tighter security, use OIDC federation instead of static tokens. This removes long-lived credentials from the equation and lets GitHub Actions authenticate directly using a federated identity.

2. Azure DevOps Pipeline Setup

For teams on Azure, the pipeline follows the same pattern but uses service connections and variable groups instead of GitHub secrets. Configure a service principal in Entra ID, add it to the Databricks workspace as a managed principal, and store the token as a pipeline variable group secret.

yaml

trigger:
  branches:
    include: [main]

pool:
  vmImage: ubuntu-latest

steps:
  - script: |
      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
    displayName: Install Databricks CLI

  - script: databricks bundle validate -t staging
    displayName: Validate bundle
    env:
      DATABRICKS_HOST: $(DATABRICKS_STAGING_HOST)
      DATABRICKS_TOKEN: $(DATABRICKS_STAGING_TOKEN)

  - script: databricks bundle deploy -t staging
    displayName: Deploy to staging
    env:
      DATABRICKS_HOST: $(DATABRICKS_STAGING_HOST)
      DATABRICKS_TOKEN: $(DATABRICKS_STAGING_TOKEN)

Teams with stricter security requirements often use separate service principals for dev, staging, and prod, limiting each principal’s permissions to its own workspace.

3. Branching Strategy for Dev, Staging, and Prod

Databricks recommends a three-environment branching model. Developers commit to feature branches and work locally or in a personal dev workspace. Feature branches merge into main via pull request, triggering the CI/CD pipeline to deploy to staging. A git tag on main promotes staging to production.

The user target in the bundle configuration handles local development within this model. Each developer deploys to their own isolated copy of the workspace with databricks bundle deploy --target user. All resources are prefixed with their username, so no developer’s work overwrites another’s in a shared environment. The shared dev target then functions as an integration environment, not a personal sandbox.

Databricks Asset Bundles vs Terraform

The most common architecture question after adopting DABs is where Terraform still fits. Both tools manage infrastructure as code and both can define Databricks resources. The answer comes down to which layer of the stack each tool owns.

1. What DABs Handle vs What Terraform Handles

Resource Type	DABs	Terraform
Lakeflow Jobs and pipelines	Yes	Yes
Notebooks and libraries	Yes	No
Cluster definitions	Yes	Yes
Permissions on bundle resources	Yes	Yes
Unity Catalog schemas	Yes (direct engine)	Yes
Databricks workspaces	No	Yes
Unity Catalog metastores	No	Yes
Cloud networking (VNets, subnets)	No	Yes
Service principals	Limited	Yes
Cross-workspace infrastructure	No	Yes

DABs are project-scoped. They manage the resources a specific data product needs to run. Terraform is platform-scoped. It provisions the workspaces, metastores, networking, and identity infrastructure that DABs deploy into.

2. Why Most Enterprise Teams Use Both

DABs previously ran Terraform under the hood, which created confusion about overlap. With the direct deployment engine, that dependency is gone and the boundary is cleaner. A practical split that works at scale:

Infrastructure engineers manage workspaces, metastores, Unity Catalog, and service principals via Terraform in a separate infrastructure repository
Data engineering teams manage jobs, pipelines, dashboards, and schemas via bundles in their project repositories

The two systems operate independently, and each team has clear ownership of their layer. This is the model Databricks recommends and what most mature enterprise deployments converge on.

Common Mistakes Teams Make With Databricks Asset Bundles

Most DAB problems are process errors. The tool is set up correctly. The guardrails around how it gets used are absent.

1. Deploying Directly From Local Without CI/CD Gates

This is how most production incidents with DABs happen. A developer deploys from a local environment using a configuration that differs from what is committed to Git. The workspace updates without review, and the diff is invisible until something fails downstream.

The fix is a policy decision. Bundle deploys to staging and production should run only from CI/CD pipelines triggered by Git events. Local deploys belong in user-scoped dev targets only. Enforce this with branch protection rules and by restricting which service principals have deploy permissions in staging and production workspaces.

2. Hardcoding Environment-Specific Values

Cluster IDs, catalog names, and workspace paths hardcoded in resource YAML files create silent failures when the bundle deploys across environments. The dev cluster ID does not exist in the production workspace. The deploy fails, often with a generic error that does not point to the actual cause.

All environment-specific values belong in the variables block of databricks.yml, with target-level overrides. A variable referenced as ${var.cluster_id} resolves to different values in dev and prod with no change to the resource definition file itself.

3. Skipping Bundle Validate in the Pipeline

databricks bundle validate catches YAML syntax errors, undefined variable references, and resource configuration conflicts before anything is deployed. It runs in seconds. Despite this, teams often skip it because early deploys succeed without it.

The problems appear later. A typo in a resource definition, a missing variable override for a new environment target, or an unsupported resource type against an older CLI version will all surface as cryptic deploy failures that validate would have caught in seconds. Running validate as the first step in every CI/CD run is the simplest governance control available.

4. Mismanaging Permissions Across Targets

Development mode scopes resources to the deploying user by default. Production mode requires explicit permission grants defined in the bundle configuration. Teams that skip the permissions block in production often find that a scheduled job runs as the deploying service principal, which may lack access to the Unity Catalog tables the job reads from.

Define permissions explicitly in the resource YAML and verify them after deploy with databricks bundle summary. Service principals for production deploys should hold the minimum permissions required, with no workspace admin rights. For teams building governance frameworks on top of Databricks, Kanerika’s data governance services cover Unity Catalog setup and access control design as part of the broader implementation.

Advanced Patterns for Enterprise Teams

Teams past the basics run into scaling problems the default bundle setup does not address. Developer environments collide, configurations multiply, and new projects copy-paste YAML instead of inheriting shared standards.

1. User Targets for Developer Isolation

The user target pattern gives every developer their own isolated copy of the dev environment. Add a user target to databricks.yml:

yaml

targets:
  user:
    mode: development
    default: true
    workspace:
      host: ${var.dev_workspace_host}
      root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}

When a developer runs databricks bundle deploy, all resources are prefixed with their username and deployed to their own workspace path. One developer’s pipeline deploy does not affect another’s. The shared dev target functions as an integration environment, separate from individual development work.

2. Monorepo Patterns for Multi-Domain Platforms

Large platforms with multiple data domains benefit from a monorepo structure where each domain owns its own bundle subdirectory:

data-platform/
  finance-pipelines/
    databricks.yml
    resources/
    notebooks/
  customer-data/
    databricks.yml
    resources/
    notebooks/
  shared-infrastructure/
    terraform/

Each bundle deploys independently. Finance pipelines and customer data pipelines have separate CI/CD workflows, separate deployment histories, and separate ownership. Teams deploy their bundle without waiting on other teams, and rollbacks affect only the relevant bundle.

3. Custom Bundle Templates for Organizational Standards

Default templates give teams the basics. Custom templates let organizations encode their standards from the start. A custom template can include default permissions, a pre-configured service principal, cluster policy references, standard variable definitions, and pre-built CI/CD workflow files.

Create a custom template by following the Databricks bundle template specification and hosting it in a shared Git repository. Teams initialize new bundles with databricks bundle init --template-path <git-repo> and get the organization’s standards built in from day one.

Databricks Asset Bundles for MLOps

MLOps workflows are harder to manage than standard data pipelines because the boundary between code and infrastructure is less defined. Training jobs, model experiments, and serving endpoints all have configuration that changes frequently and needs to be versioned separately from the model weights themselves.

1. Managing ML Pipelines as Code

A bundle can define the full ML pipeline lifecycle. The training job, the MLflow experiment that tracks runs, and the model serving endpoint all live in the same YAML configuration, deploy together, and are tracked under the same Git commit.

yaml

resources:
  jobs:
    model_training:
      name: Weekly Churn Model Training
      tasks:
        - task_key: train
          notebook_task:
            notebook_path: ./notebooks/train_churn_model.py

  experiments:
    churn_experiment:
      name: /Shared/churn-model-experiment

  model_serving_endpoints:
    churn_scoring:
      name: churn-scoring-endpoint
      config:
        served_entities:
          - entity_name: churn-model
            scale_to_zero_enabled: true

Versioning the serving endpoint configuration alongside the training job keeps architecture and serving configuration changes in the same pull request. They are reviewed together, deployed together, and rolled back together if something goes wrong.

2. MLflow Experiments and Model Serving in Bundle YAML

MLflow experiments defined in a bundle are created if they do not exist and left unchanged if they do, on each deploy. This makes them safe to include in development deploys where the experiment may already have runs. A few configuration details worth knowing for production ML workloads:

Model serving endpoints support scale_to_zero_enabled, worth setting to true in staging and dev targets to avoid idle compute costs
The run_as block, available for pipelines since CLI v0.267+, specifies the identity under which the pipeline runs
For production, that identity should be a service principal with access to the relevant Unity Catalog resources, not a user account that may eventually be deactivated

Databricks Lakeflow: What It Is & How Teams Use It

Learn how Databricks Lakeflow simplifies data ingestion, pipeline orchestration, and workflow automation to build, reliable, and AI-ready data pipelines.

Learn More

How Kanerika Helps Enterprise Teams With DABs

The technical side of adopting Databricks Asset Bundles is well-documented. The harder part is organizational. Defining the Terraform boundary, building deployment governance, and getting bundle templates in place before teams develop configuration habits that are expensive to undo. This is where most enterprise adoptions stall.

Kanerika is a certified Databricks Consulting Partner with end-to-end data engineering implementations across healthcare, manufacturing, logistics, and financial services. The team has delivered migrations from legacy ETL tools into structured, bundle-managed architectures. Our data engineering services cover the full stack from Unity Catalog architecture through bundle governance, CI/CD setup, and production deployment.

Case Study: Modernizing Healthcare Analytics With Informatica to Databricks Migration

A leading healthcare provider managing clinical records, claims data, billing transactions, and operational datasets across multiple care units ran its analytics on Informatica. Batch-heavy processing slowed refresh cycles, delayed reports, and made cross-system data alignment difficult. Inconsistent transformation rules across departments made migration complex and prolonged validation cycles.

Challenge

The provider needed to migrate existing Informatica workflows without disrupting live reporting, unify transformation logic across departments running different coding standards, and give medical, finance, and administrative teams faster access to operational insights.

Solution

Kanerika migrated the existing Informatica workflows to Azure Databricks using its migration accelerator, re-architected the data pipelines for efficient processing, and established a centralized rule framework for coding standards and key healthcare metrics:

Migrated clinical, claims, and billing workflows to Databricks without disrupting live reporting
Rebuilt transformation logic using unified coding standards across departments
Set up optimized analytical paths giving medical, finance, and administrative teams faster access to insights

Results

71% higher reporting accuracy
64% faster decision-making
38% reduction in data handling costs

Wrapping Up

Databricks Asset Bundles give data engineering teams what application developers have had for years: a repeatable, reviewable, version-controlled way to deploy. The tooling is mature, the CLI is stable, and the path from manual UI-managed jobs to bundle-managed infrastructure is well-documented. The harder work is organizational. Governing who deploys what and when, drawing the Terraform boundary, and building templates that keep new projects consistent from day one.

Teams that get this right ship data products the way software teams ship services: pull requests, automated tests, staged environments, and a full audit trail. That outcome comes from designing the deployment model deliberately, before the platform grows to a point where retrofitting structure is expensive. Talk to Kanerika’s team to discuss how DABs fit your specific Databricks environment.

Take the Complexity Out of Databricks Deployments.

Kanerika Enables Teams to Adopt Databricks Asset Bundles for Faster, More Reliable Delivery.

Book a Meeting

FAQs

1. What are Databricks Asset Bundles?

Databricks Asset Bundles are a deployment framework that enables you to define, package, and deploy Databricks resources as code. Instead of manually configuring jobs, notebooks, pipelines, and other workspace assets, you can manage them through version-controlled configuration files. This approach improves consistency, simplifies collaboration, and makes deployments repeatable across development, staging, and production environments.

2. Why should I use Databricks Asset Bundles?

Databricks Asset Bundles help automate deployments, minimize manual configuration errors, and standardize how projects are managed across teams. They make it easier to implement DevOps best practices, integrate with CI/CD pipelines, and maintain consistent environments. As a result, development teams can release updates faster while reducing the risk of deployment failures.

3. What resources can be deployed using Databricks Asset Bundles?

Databricks Asset Bundles support the deployment of various workspace resources, including notebooks, jobs, workflows, Delta Live Tables pipelines, model serving endpoints, dashboards, and other supported assets. By managing these resources through a single project structure, organizations can simplify project maintenance and ensure every environment is configured consistently.

4. How do Databricks Asset Bundles support CI/CD?

Databricks Asset Bundles integrate seamlessly with popular CI/CD platforms such as GitHub Actions, Azure DevOps, GitLab CI/CD, and Jenkins. Teams can automatically validate configurations, run tests, and deploy updates whenever code changes are committed. This automation reduces manual effort, improves deployment reliability, and accelerates software delivery without compromising quality.

5. What is the difference between Databricks Asset Bundles and Terraform?

Although both support infrastructure automation, they serve different purposes. Terraform is primarily used to provision cloud infrastructure and manage platform-level resources, while Databricks Asset Bundles focus on packaging and deploying Databricks workspace assets such as notebooks, jobs, and pipelines. Many organizations use Terraform and Asset Bundles together to achieve complete infrastructure and application lifecycle management.

6. How do I get started with Databricks Asset Bundles?

To begin, install the latest Databricks CLI and create a project containing a databricks.yml configuration file. Next, define deployment targets, organize your project resources, and validate the bundle before deployment. Once configured, you can deploy the bundle using simple CLI commands and integrate the process into your Git-based CI/CD pipeline for automated releases.

7. Are Databricks Asset Bundles suitable for production environments?

Yes. Databricks Asset Bundles are designed to support enterprise-grade production deployments. They provide environment-specific configurations, version control integration, deployment validation, and repeatable release processes that reduce operational risks. These capabilities make them ideal for organizations managing complex analytics, machine learning, and data engineering workloads at scale.

8. What are the best practices for using Databricks Asset Bundles?

Some recommended best practices include storing projects in Git, maintaining separate configurations for development, staging, and production, validating bundles before deployment, and automating releases through CI/CD pipelines. It is also important to manage secrets securely, keep project structures modular, use reusable templates where possible, and regularly update the Databricks CLI to take advantage of new features and improvements.

Authored by

Harisha Patangay | Executive Content Writer

Harisha is an Executive Content Writer at Kanerika, turning complex AI, data, and digital transformation topics into engaging content, backed by experience across fintech and SaaS industries.

View Profile ⇒

AI Agents

AI Services

Data Services

AI Agents

AI for Enterprise

Tools

Resources

Partners