Most Databricks teams deploy jobs the same way they did three years ago. Someone clicks through the UI, exports a JSON config, and emails it to the next person. When something breaks in production, nobody knows what changed, who changed it, or when. The process works until the team grows, and then it stops working entirely.
Databricks Asset Bundles, officially renamed Declarative Automation Bundles in March 2026 , put the entire project into code. Jobs, pipelines, clusters, permissions, and environment configs live in YAML files in Git and deploy with a single CLI command. The rename is backward-compatible and the DAB abbreviation is still in common use.
In this article, we cover what bundles are, how to set one up, how to connect them to CI/CD, where Terraform fits, and the mistakes enterprise teams consistently make when they skip the governance layer.
Key Takeaways Databricks Asset Bundles, officially renamed Declarative Automation Bundles in March 2026, let teams define their entire Databricks project as YAML-based code and deploy it with a single CLI command The core bundle structure includes a databricks.yml file, resource definitions, environment targets, and custom variables, all version-controlled in Git A new direct deployment engine (CLI v0.279+) removes the previous Terraform dependency, making deployments faster and eliminating state drift issues CI/CD integration with GitHub Actions or Azure DevOps is the right way to govern bundle deployments. Manual local deploys to production are the most common source of environment drift DABs and Terraform are complementary: use DABs for project-level resources including jobs, pipelines, and dashboards, and Terraform for platform-level infrastructure including workspaces, Unity Catalog, and cloud networking Enterprise teams that scale DABs well use user-scoped targets for developer isolation, monorepo structures for multi-domain platforms, and custom templates to enforce organizational standards
Modernize Your Databricks DevOps Strategy. Partner with Kanerika to Automate Deployments and Improve Collaboration with Databricks Asset Bundles.
Book a Meeting
What Are Databricks Asset Bundles? Databricks Asset Bundles are an infrastructure-as-code approach to managing Databricks projects. Instead of creating jobs through the UI, maintaining JSON export files, or relying on custom CLI scripts, teams define everything in YAML files that live alongside source code in Git. Jobs, pipelines, clusters, permissions, and environment configurations all become version-controlled artifacts.
In March 2026, Databricks renamed the feature from Databricks Asset Bundles to Declarative Automation Bundles with CLI v0.287+. The rename is fully backward-compatible. All existing configurations, CLI commands, and file names stay exactly the same, and the DAB abbreviation is still in common use.
1. The Concept Behind Declarative Infrastructure If a resource exists in Databricks, it should exist as a file in the repository. When a data engineer creates a job via the Databricks UI, that job lives only in the workspace. Two developers trying to modify it simultaneously overwrite each other’s work, there is no review process, and reproducing the exact same config in another environment requires doing it manually from scratch.
Bundles solve this by making the YAML file the source of truth. Run databricks bundle deploy and Databricks reads that file, computes what needs to change, and updates the workspace to match. The workspace becomes an output of code, not a place where configuration lives independently. Every change ships through Git, so every deploy has a commit hash, a reviewer, and an audit trail.
2. What Changed With the Declarative Automation Bundles Rename The rename was substantive. When bundles first launched, assets referred narrowly to notebooks and jobs. Declarative automation more accurately reflects the tool’s current scope. Bundles today manage dashboards, Unity Catalog schemas, model serving endpoints, MLflow experiments, SQL alerts, and Lakebase Postgres projects, all as versioned YAML resources.
A separate change arrived with CLI v0.279+ in December 2025. Databricks introduced the direct deployment engine. Previously, DABs used Terraform under the hood to manage state. The new engine removes that dependency entirely. Teams that migrate with databricks bundle migrate can drop Terraform state files and version compatibility management entirely.
Inside a Databricks Bundle: Core Components A bundle is a directory. It contains source code files, YAML configuration files that describe Databricks resources, and the structure that ties them together for deployment. Understanding these parts before writing any YAML saves considerable debugging time later.
1. The databricks.yml File Every bundle has exactly one top-level databricks.yml file. This is the entry point the Databricks CLI reads. It declares the bundle name, optionally pulls in other YAML files, defines variables, and sets up deployment targets.
A minimal databricks.yml looks like this:
yaml
bundle:
name: customer-etl
include:
- resources/*.yml
variables:
env:
description: Deployment environment
default: dev
targets:
dev:
mode: development
default: true
workspace:
host: https://your-dev-workspace.azuredatabricks.net
prod:
mode: production
workspace:
host: https://your-prod-workspace.azuredatabricks.netThe include directive pulls in resource definition files from subdirectories, keeping the root file readable. The targets block defines each environment with its own workspace URL and deployment mode.
2. Resources, Targets, and Variables Resource files define the Databricks objects the bundle creates and manages. A typical job definition looks like this:
yaml
resources:
jobs:
customer_ingestion:
name: Customer Ingestion Job
tasks:
- task_key: ingest
notebook_task:
notebook_path: ./notebooks/ingest.py
existing_cluster_id: ${var.cluster_id}
schedule:
quartz_cron_expression: "0 0 6 * * ?"
timezone_id: UTCVariables handle the values that differ across environments. A variable defined in databricks.yml is referenced anywhere in the bundle with ${var.variable_name}. Target-level overrides let dev and prod resolve to different cluster IDs, catalog names, or schedule frequencies, removing the need to duplicate entire resource files per environment. Keeping variables clean from the start is far easier than refactoring them out of hardcoded resource files after a team has grown to five engineers with three active environments.
3. How the Direct Deployment Engine Changed the Architecture Before CLI v0.279+, DABs ran Terraform behind the scenes. Terraform tracked the bundle’s deployed state in a state file stored in the Databricks workspace. That approach worked, but Terraform version mismatches broke deployments, state drift caused unpredictable behavior, and the extra abstraction layer added latency.
The direct deployment engine removes all of that. Databricks now tracks deployed resources natively in the workspace, with no separate state file. To migrate an existing bundle, run databricks bundle migrate from the bundle directory. The official migration guide walks through the full process.
How to Set Up Your First Databricks Asset Bundle Setting up a bundle from scratch takes under 30 minutes with the Databricks CLI already configured. The steps below apply to both Azure Databricks and Databricks on AWS.
1. Install the Databricks CLI (v0.287+) DABs require Databricks CLI v0.218.0 or above. The current stable release as of mid-2026 is v0.287+. Install via Homebrew on macOS or via the install script on Linux:
bash
# macOS
brew tap databricks/tap && brew install databricks
# Linux
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | shAfter installing, configure authentication. Databricks recommends OAuth user-to-machine (U2M) authentication:
bash
databricks auth login --host https://your-workspace.azuredatabricks.netVerify the version with databricks -v before proceeding. Running an older CLI against a bundle that uses direct-engine features produces confusing validation errors that are hard to diagnose without knowing the version constraint.
2. Initialize a Bundle With a Template The CLI ships with default templates for common project types. Run databricks bundle init and choose from default-python, default-sql, or dbt. For teams that want to define the structure themselves, the default-minimal template available since CLI v0.277+ gives a databricks.yml with catalog variables and nothing else.
bash
databricks bundle initThe init command asks for a project name, workspace URL, and target environments, then generates the initial directory structure. The output is a working bundle that can be validated immediately with no additional configuration.
3. Configure databricks.yml for Dev and Prod After initialization, update the workspace host URLs for each target. Set mode: development for the dev target and mode: production for prod. Development mode automatically prefixes resource names with the deploying user’s username, so personal dev resources stay isolated from shared environments.
Add variables for any value that differs between environments. Cluster IDs, catalog names, and notification email addresses are the most common candidates. Hardcoding these values directly in resource files is the single most common bundle configuration mistake at scale, covered in detail in the mistakes section below.
4. Validate, Deploy, and Run With configuration in place, the full bundle lifecycle runs in three commands:
bash
# Validate configuration without deploying
databricks bundle validate -t dev
# Deploy resources to the dev workspace
databricks bundle deploy -t dev
# Run a specific workflow in the deployed bundle
databricks bundle run -t dev customer_ingestionbundle validate catches YAML syntax errors, missing variable references, and resource configuration issues before anything touches the workspace. Running it as the first step in every CI/CD pipeline is a hard requirement.
How to Build a CI/CD Pipeline With Databricks Asset Bundles Bundles give teams a single command to deploy. CI/CD gives that command governance. Without automated pipelines, a developer can run databricks bundle deploy --target prod from a laptop at any time, using a local configuration that may have diverged from what is in Git. That divergence is invisible until something breaks.
1. GitHub Actions Workflow for Bundle Deployment A standard GitHub Actions workflow runs on merge to main, deploys to staging, and then requires a version tag to promote to production:
yaml
name: Deploy Databricks Bundle
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Databricks CLI
run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
- name: Deploy to staging
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_STAGING_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_STAGING_TOKEN }}
run: |
databricks bundle validate -t staging
databricks bundle deploy -t staging
- name: Deploy to prod (on tag)
if: startsWith(github.ref, 'refs/tags/v')
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_PROD_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_PROD_TOKEN }}
run: databricks bundle deploy -t prodStore workspace credentials as GitHub secrets, never inline in the workflow file. For tighter security, use OIDC federation instead of static tokens. This removes long-lived credentials from the equation and lets GitHub Actions authenticate directly using a federated identity.
2. Azure DevOps Pipeline Setup For teams on Azure, the pipeline follows the same pattern but uses service connection s and variable groups instead of GitHub secrets. Configure a service principal in Entra ID, add it to the Databricks workspace as a managed principal, and store the token as a pipeline variable group secret.
yaml
trigger:
branches:
include: [main]
pool:
vmImage: ubuntu-latest
steps:
- script: |
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
displayName: Install Databricks CLI
- script: databricks bundle validate -t staging
displayName: Validate bundle
env:
DATABRICKS_HOST: $(DATABRICKS_STAGING_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_STAGING_TOKEN)
- script: databricks bundle deploy -t staging
displayName: Deploy to staging
env:
DATABRICKS_HOST: $(DATABRICKS_STAGING_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_STAGING_TOKEN)Teams with stricter security requirements often use separate service principals for dev, staging, and prod, limiting each principal’s permissions to its own workspace.
3. Branching Strategy for Dev, Staging, and Prod Databricks recommends a three-environment branching model. Developers commit to feature branches and work locally or in a personal dev workspace. Feature branches merge into main via pull request, triggering the CI/CD pipeline to deploy to staging. A git tag on main promotes staging to production.
The user target in the bundle configuration handles local development within this model. Each developer deploys to their own isolated copy of the workspace with databricks bundle deploy --target user. All resources are prefixed with their username, so no developer’s work overwrites another’s in a shared environment. The shared dev target then functions as an integration environment, not a personal sandbox.
Databricks Asset Bundles vs Terraform The most common architecture question after adopting DABs is where Terraform still fits. Both tools manage infrastructure as code and both can define Databricks resources. The answer comes down to which layer of the stack each tool owns.
1. What DABs Handle vs What Terraform Handles Resource Type DABs Terraform Lakeflow Jobs and pipelines Yes Yes Notebooks and libraries Yes No Cluster definitions Yes Yes Permissions on bundle resources Yes Yes Unity Catalog schemas Yes (direct engine) Yes Databricks workspaces No Yes Unity Catalog metastores No Yes Cloud networking (VNets, subnets) No Yes Service principals Limited Yes Cross-workspace infrastructure No Yes
DABs are project-scoped. They manage the resources a specific data product needs to run. Terraform is platform-scoped. It provisions the workspaces, metastores, networking, and identity infrastructure that DABs deploy into.
2. Why Most Enterprise Teams Use Both DABs previously ran Terraform under the hood, which created confusion about overlap. With the direct deployment engine, that dependency is gone and the boundary is cleaner. A practical split that works at scale:
Infrastructure engineers manage workspaces, metastores, Unity Catalog , and service principals via Terraform in a separate infrastructure repository Data engineering teams manage jobs, pipelines, dashboards, and schemas via bundles in their project repositories The two systems operate independently, and each team has clear ownership of their layer. This is the model Databricks recommends and what most mature enterprise deployments converge on.
Common Mistakes Teams Make With Databricks Asset Bundles Most DAB problems are process errors. The tool is set up correctly. The guardrails around how it gets used are absent.
1. Deploying Directly From Local Without CI/CD Gates This is how most production incidents with DABs happen. A developer deploys from a local environment using a configuration that differs from what is committed to Git. The workspace updates without review, and the diff is invisible until something fails downstream.
The fix is a policy decision. Bundle deploys to staging and production should run only from CI/CD pipelines triggered by Git events. Local deploys belong in user-scoped dev targets only. Enforce this with branch protection rules and by restricting which service principals have deploy permissions in staging and production workspaces.
2. Hardcoding Environment-Specific Values Cluster IDs, catalog names, and workspace paths hardcoded in resource YAML files create silent failures when the bundle deploys across environments. The dev cluster ID does not exist in the production workspace. The deploy fails, often with a generic error that does not point to the actual cause.
All environment-specific values belong in the variables block of databricks.yml, with target-level overrides. A variable referenced as ${var.cluster_id} resolves to different values in dev and prod with no change to the resource definition file itself.
3. Skipping Bundle Validate in the Pipeline databricks bundle validate catches YAML syntax errors, undefined variable references, and resource configuration conflicts before anything is deployed. It runs in seconds. Despite this, teams often skip it because early deploys succeed without it.
The problems appear later. A typo in a resource definition, a missing variable override for a new environment target, or an unsupported resource type against an older CLI version will all surface as cryptic deploy failures that validate would have caught in seconds. Running validate as the first step in every CI/CD run is the simplest governance control available.
4. Mismanaging Permissions Across Targets Development mode scopes resources to the deploying user by default. Production mode requires explicit permission grants defined in the bundle configuration. Teams that skip the permissions block in production often find that a scheduled job runs as the deploying service principal, which may lack access to the Unity Catalog tables the job reads from.
Define permissions explicitly in the resource YAML and verify them after deploy with databricks bundle summary. Service principals for production deploys should hold the minimum permissions required, with no workspace admin rights. For teams building governance frameworks on top of Databricks, Kanerika’s data governance services cover Unity Catalog setup and access control design as part of the broader implementation.
Advanced Patterns for Enterprise Teams Teams past the basics run into scaling problems the default bundle setup does not address. Developer environments collide, configurations multiply, and new projects copy-paste YAML instead of inheriting shared standards.
1. User Targets for Developer Isolation The user target pattern gives every developer their own isolated copy of the dev environment. Add a user target to databricks.yml:
yaml
targets:
user:
mode: development
default: true
workspace:
host: ${var.dev_workspace_host}
root_path: /Workspace/Users/${workspace.current_user.userName}/.bundle/${bundle.name}When a developer runs databricks bundle deploy, all resources are prefixed with their username and deployed to their own workspace path. One developer’s pipeline deploy does not affect another’s. The shared dev target functions as an integration environment, separate from individual development work.
2. Monorepo Patterns for Multi-Domain Platforms Large platforms with multiple data domains benefit from a monorepo structure where each domain owns its own bundle subdirectory:
data-platform/
finance-pipelines/
databricks.yml
resources/
notebooks/
customer-data/
databricks.yml
resources/
notebooks/
shared-infrastructure/
terraform/Each bundle deploys independently. Finance pipelines and customer data pipelines have separate CI/CD workflows, separate deployment histories, and separate ownership. Teams deploy their bundle without waiting on other teams, and rollbacks affect only the relevant bundle.
3. Custom Bundle Templates for Organizational Standards Default templates give teams the basics. Custom templates let organizations encode their standards from the start. A custom template can include default permissions, a pre-configured service principal, cluster policy references, standard variable definitions, and pre-built CI/CD workflow files.
Create a custom template by following the Databricks bundle template specification and hosting it in a shared Git repository. Teams initialize new bundles with databricks bundle init --template-path <git-repo> and get the organization’s standards built in from day one.
Databricks Asset Bundles for MLOps MLOps workflows are harder to manage than standard data pipelines because the boundary between code and infrastructure is less defined. Training jobs, model experiments, and serving endpoints all have configuration that changes frequently and needs to be versioned separately from the model weights themselves.
1. Managing ML Pipelines as Code A bundle can define the full ML pipeline lifecycle. The training job, the MLflow experiment that tracks runs, and the model serving endpoint all live in the same YAML configuration, deploy together, and are tracked under the same Git commit.
yaml
resources:
jobs:
model_training:
name: Weekly Churn Model Training
tasks:
- task_key: train
notebook_task:
notebook_path: ./notebooks/train_churn_model.py
experiments:
churn_experiment:
name: /Shared/churn-model-experiment
model_serving_endpoints:
churn_scoring:
name: churn-scoring-endpoint
config:
served_entities:
- entity_name: churn-model
scale_to_zero_enabled: true Versioning the serving endpoint configuration alongside the training job keeps architecture and serving configuration changes in the same pull request. They are reviewed together, deployed together, and rolled back together if something goes wrong.
2. MLflow Experiments and Model Serving in Bundle YAML MLflow experiments defined in a bundle are created if they do not exist and left unchanged if they do, on each deploy. This makes them safe to include in development deploys where the experiment may already have runs. A few configuration details worth knowing for production ML workloads:
Model serving endpoints support scale_to_zero_enabled, worth setting to true in staging and dev targets to avoid idle compute costs The run_as block, available for pipelines since CLI v0.267+, specifies the identity under which the pipeline runs For production, that identity should be a service principal with access to the relevant Unity Catalog resources, not a user account that may eventually be deactivated Databricks Lakeflow: What It Is & How Teams Use It Learn how Databricks Lakeflow simplifies data ingestion, pipeline orchestration, and workflow automation to build, reliable, and AI-ready data pipelines.
Learn More
How Kanerika Helps Enterprise Teams With DABs The technical side of adopting Databricks Asset Bundles is well-documented. The harder part is organizational. Defining the Terraform boundary, building deployment governance, and getting bundle templates in place before teams develop configuration habits that are expensive to undo. This is where most enterprise adoptions stall.
Kanerika is a certified Databricks Consulting Partner with end-to-end data engineering implementations across healthcare, manufacturing, logistics, and financial services. The team has delivered migrations from legacy ETL tools into structured, bundle-managed architectures. Our data engineering services cover the full stack from Unity Catalog architecture through bundle governance, CI/CD setup, and production deployment.
A leading healthcare provider managing clinical records, claims data, billing transactions, and operational datasets across multiple care units ran its analytics on Informatica. Batch-heavy processing slowed refresh cycles, delayed reports, and made cross-system data alignment difficult. Inconsistent transformation rules across departments made migration complex and prolonged validation cycles.
Challenge The provider needed to migrate existing Informatica workflows without disrupting live reporting, unify transformation logic across departments running different coding standards, and give medical, finance, and administrative teams faster access to operational insights.
Solution Kanerika migrated the existing Informatica workflows to Azure Databricks using its migration accelerator, re-architected the data pipelines for efficient processing, and established a centralized rule framework for coding standards and key healthcare metrics:
Migrated clinical, claims, and billing workflows to Databricks without disrupting live reporting Rebuilt transformation logic using unified coding standards across departments Set up optimized analytical paths giving medical, finance, and administrative teams faster access to insights
Results 71% higher reporting accuracy 64% faster decision-making 38% reduction in data handling costs
Wrapping Up Databricks Asset Bundles give data engineering teams what application developers have had for years: a repeatable, reviewable, version-controlled way to deploy. The tooling is mature, the CLI is stable, and the path from manual UI-managed jobs to bundle-managed infrastructure is well-documented. The harder work is organizational. Governing who deploys what and when, drawing the Terraform boundary, and building templates that keep new projects consistent from day one.
Teams that get this right ship data products the way software teams ship services: pull requests, automated tests, staged environments, and a full audit trail. That outcome comes from designing the deployment model deliberately, before the platform grows to a point where retrofitting structure is expensive. Talk to Kanerika’s team to discuss how DABs fit your specific Databricks environment.
Kanerika Enables Teams to Adopt Databricks Asset Bundles for Faster, More Reliable Delivery.
Book a Meeting
FAQs 1. What are Databricks Asset Bundles? Databricks Asset Bundles are a deployment framework that enables you to define, package, and deploy Databricks resources as code. Instead of manually configuring jobs, notebooks, pipelines, and other workspace assets, you can manage them through version-controlled configuration files. This approach improves consistency, simplifies collaboration, and makes deployments repeatable across development, staging, and production environments.
2. Why should I use Databricks Asset Bundles? Databricks Asset Bundles help automate deployments, minimize manual configuration errors, and standardize how projects are managed across teams. They make it easier to implement DevOps best practices, integrate with CI/CD pipelines, and maintain consistent environments. As a result, development teams can release updates faster while reducing the risk of deployment failures.
3. What resources can be deployed using Databricks Asset Bundles? Databricks Asset Bundles support the deployment of various workspace resources, including notebooks, jobs, workflows, Delta Live Tables pipelines, model serving endpoints, dashboards, and other supported assets. By managing these resources through a single project structure, organizations can simplify project maintenance and ensure every environment is configured consistently.
4. How do Databricks Asset Bundles support CI/CD? Databricks Asset Bundles integrate seamlessly with popular CI/CD platforms such as GitHub Actions, Azure DevOps, GitLab CI/CD, and Jenkins. Teams can automatically validate configurations, run tests, and deploy updates whenever code changes are committed. This automation reduces manual effort, improves deployment reliability, and accelerates software delivery without compromising quality.
5. What is the difference between Databricks Asset Bundles and Terraform? Although both support infrastructure automation, they serve different purposes. Terraform is primarily used to provision cloud infrastructure and manage platform-level resources, while Databricks Asset Bundles focus on packaging and deploying Databricks workspace assets such as notebooks, jobs, and pipelines. Many organizations use Terraform and Asset Bundles together to achieve complete infrastructure and application lifecycle management.
6. How do I get started with Databricks Asset Bundles? To begin, install the latest Databricks CLI and create a project containing a databricks.yml configuration file. Next, define deployment targets, organize your project resources, and validate the bundle before deployment. Once configured, you can deploy the bundle using simple CLI commands and integrate the process into your Git-based CI/CD pipeline for automated releases.
7. Are Databricks Asset Bundles suitable for production environments? Yes. Databricks Asset Bundles are designed to support enterprise-grade production deployments. They provide environment-specific configurations, version control integration, deployment validation, and repeatable release processes that reduce operational risks. These capabilities make them ideal for organizations managing complex analytics, machine learning, and data engineering workloads at scale.
8. What are the best practices for using Databricks Asset Bundles? Some recommended best practices include storing projects in Git, maintaining separate configurations for development, staging, and production, validating bundles before deployment, and automating releases through CI/CD pipelines. It is also important to manage secrets securely, keep project structures modular, use reusable templates where possible, and regularly update the Databricks CLI to take advantage of new features and improvements.