How to do CI/CD for Databricks

March 13, 2026

A practical guide to CI/CD for Databricks, comparing databricks sync for fast iteration with Databricks Asset Bundles for production-grade deployments.

Last Updated: March 2026

CI/CD for Databricks uses two tools: databricks sync to push local file changes to your workspace in real time during development, and Databricks Asset Bundles (DABs) to deploy notebooks, jobs, pipelines, and cluster configurations through automated CI/CD pipelines across dev, staging, and production environments. Most teams use both.

This guide covers when to use each, how to set them up, and how they compare.

databricks sync: the inner development loop

databricks sync is a CLI command that syncs files from a local directory to your Databricks workspace. Think of it like rsync for Databricks. There are two ways to run it:

Without flags, it does a one-time push of your files.
With --watch it stays running, automatically pushing changes every time you save a file locally.

When to use it

If you're writing a notebook or a Python module and want to test it against a running cluster, databricks sync gets your code there without manually uploading through the UI or pushing a branch and waiting for a pipeline. This is most useful when iterating on notebook logic against real data, or developing Python libraries that get imported by other notebooks. It's also good for prototyping new jobs before you formalize them into a bundle.

How it works

Install the Databricks CLI (v0.200+) and authenticate:

databricks auth login --host https://<workspace>.cloud.databricks.com

One-time sync:

databricks sync . /Repos/<user>/<project>

Continuous sync (stays running, pushes on every file save):

databricks sync . /Repos/your-user/your-project --watch

You can point either at a Repos path or a workspace path depending on your setup.

Limitations

databricks sync only handles files. So, moving python notebooks is trivially easy, but it won't help you with jobs, pipelines, or dashboards. It also doesn't handle multiple environments. There's no concept of "deploy this to staging" with sync. It's a 1:1 mapping from your local directory to a single workspace path.

That's fine for development, but we need something a bit stronger for production.

Databricks Asset Bundles: the outer deployment loop

Databricks Asset Bundles (DABs) are the production deployment tool. Databricks introduced them in 2023 to replace dbx (now deprecated). A bundle is a directory with your source code and YAML config files that define your jobs, pipelines, clusters, permissions, and environment-specific overrides.

What goes in a bundle

A typical bundle looks like this:

my-project/
├── databricks.yml          # bundle definition
├── resources/
│   ├── jobs.yml            # job definitions
│   └── pipelines.yml       # DLT pipeline definitions
├── src/
│   ├── notebooks/
│   │   └── transform.py
│   └── libraries/
│       └── utils.py
└── tests/
    └── test_utils.py

databricks.yml is where you define your bundle name, workspace mappings, and which resource files to include. A simply ver:

bundle:
  name: my-etl-project

workspace:
  host: https://your-workspace.cloud.databricks.com

targets:
  dev:
    mode: development
    default: true
    workspace:
      root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev

  staging:
    workspace:
      host: https://<staging-workspace>.cloud.databricks.com
      root_path: /Shared/.bundle/${bundle.name}/staging

  prod:
    workspace:
      host: https://<prod-workspace>.cloud.databricks.com
      root_path: /Shared/.bundle/${bundle.name}/prod
    permissions:
      - level: CAN_MANAGE
        group_name: production-admins

include:
  - resources/*.yml

The deployment workflow

DABs have three commands:

databricks bundle validate checks your YAML for errors and resolves variable references. Run this in CI before anything else.
databricks bundle deploy -t <target> pushes your code and resource definitions to the target workspace. It creates or updates jobs, pipelines, and other resources to match your config.
databricks bundle run -t <target> <resource> triggers a specific job or pipeline after deployment.

A CI/CD pipeline using DABs typically looks like this:

# .github/workflows/deploy.yml
name: Deploy Databricks Bundle

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main
      - run: databricks bundle validate -t staging

  deploy-staging:
    needs: validate
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main
      - run: databricks bundle deploy -t staging
      - run: databricks bundle run -t staging integration_tests

  deploy-prod:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main
      - run: databricks bundle deploy -t prod

Why DABs beat the alternatives

Before DABs, people used a mix of approaches: custom Python scripts calling the Databricks REST API, Terraform for workspace resources, dbx for job deployment, or just clicking around in the UI. Each had problems.

Terraform can manage Databricks resources, and I still recommend it for workspace-level infrastructure (Unity Catalog, metastores, workspace configuration). But Terraform treats notebooks and job definitions as opaque blobs. It doesn't understand the relationship between a notebook and the job that runs it. DABs do. They were built for this.

dbx worked but was maintained by Databricks Labs, not the core team. It used its own config format that didn't align well with the REST API. When Databricks shipped DABs, they deprecated dbx and told everyone to migrate. If you're still on dbx, you should move.

The UI is fine for exploration. It's a disaster for production. No version control, no review process, no rollback, no audit trail. If someone edits a production job through the UI at 2am, good luck figuring out what changed when it breaks the next morning.

databricks sync vs Databricks Asset Bundles

Side by side:

	databricks sync	Databricks Asset Bundles
Purpose	Fast local development	Production deployment
What it deploys	Files only	Files, jobs, pipelines, clusters, permissions
Environment support	Single workspace path	Multiple targets (dev/staging/prod)
CI/CD integration	Not designed for it	Built for it
Config format	None	YAML (databricks.yml)
State management	None	Tracks deployed resources
Rollback	Revert your files	Redeploy previous commit
Learning curve	5 minutes	A few hours

They're complementary. databricks sync is the tool you use while writing code. DABs are the tool you use to ship it.

Practical setup for a team

If I were setting up CI/CD for a Databricks project from scratch, here's the workflow I'd build.

Local development

Each developer installs the Databricks CLI and authenticates to the dev workspace. They use databricks sync to push code to their personal directory under /Repos/username/project. Some teams share dev clusters; others let individuals spin up their own. Either works, just keep an eye on the bill.

Branch workflow

Your repo has a databricks.yml with at least two targets: dev and prod. Developers work on feature branches. When they open a pull request, CI runs databricks bundle validate to catch config errors. Code review happens in the PR like any other project.

Deployment pipeline

When a PR merges to main, CI deploys to staging using databricks bundle deploy -t staging and runs integration tests. If tests pass, a manual approval gate triggers the production deployment. Some teams automate this fully; others want a human click before prod.

Testing

Testing is the hard part. Unit tests for pure Python functions are easy. But testing notebook logic that queries Delta tables or calls Spark means you need a running cluster, and that costs money. Some options:

Run unit tests locally with pytest for logic that doesn't depend on Spark
Use databricks bundle run to trigger a test job on a cluster in CI
Use Databricks Connect to run Spark code from your local machine against a remote cluster
For DLT pipelines, deploy to a staging environment and validate output tables

I lean towards option 2, using databricks bundle run and testing on a cluster in Dev. Maybe you've got a high end macbook and would rather test locally. There's no one right answer-- pick what works best for you.

Common mistakes

Skipping validation in CI. databricks bundle validate catches typos, missing references, and schema errors before you waste time on a failed deployment. Always run it.

One giant job definition. If your resources/jobs.yml is 500 lines long, split it up. DABs let you use multiple YAML files and include them. Keep each job or pipeline in its own file.

No environment separation. I've seen teams deploy directly to production with no staging environment. It works until it doesn't. Set up at least two targets from day one.

Editing production through the UI. This undermines your entire CI/CD setup. If someone changes a job in the UI, the next bundle deploy will overwrite those changes. Use workspace permissions to restrict who can edit production resources directly.

Ignoring cluster costs in CI. Every bundle run in CI starts a cluster. If your integration tests take 20 minutes and you're running them on every push, that adds up fast. Use job clusters (they terminate after the job finishes) and consider running expensive tests only on merges to main.

Conclusion

databricks sync gets your code to the workspace fast. DABs get it to production safely. Start with a minimal databricks.yml, get bundle deploy running in CI, and add staging environments and approval gates as your team needs them.

We don't have a ton of databricks jobs in general, but there are usually a small number of roles here.

How to do CI/CD for Databricks

databricks sync: the inner development loop

When to use it

How it works

Limitations

Databricks Asset Bundles: the outer deployment loop

What goes in a bundle

The deployment workflow

Why DABs beat the alternatives

databricks sync vs Databricks Asset Bundles

Practical setup for a team

Local development

Branch workflow

Deployment pipeline

Testing

Common mistakes

Conclusion

Technologies

Certifications

Locations

AWS Practice Tests

Azure Practice Tests

Networking & Security

Official Certifications

Documentation