Databricks Just Put Git Inside Your Database (And Pointed Copilot At It)
Azure Databricks announced a public preview that combines two things developers have been cobbling together manually: copy-on-write database branching for Lakebase Postgres, and a direct line from GitHub Copilot agent mode to those branches. The pitch is simple. One command creates an isolated branch of your production database. You connect Copilot agent mode to that branch’s endpoint. Then you debug your AI apps against real production data without touching the live workload. [Azure Updates announcement]
What Actually Happened
Lakebase is Databricks’ serverless Postgres-compatible database, now GA on both AWS and Azure. It uses copy-on-write storage technology so that creating a branch gives you a full-fidelity copy of schema and data in seconds, without duplicating the underlying storage. [Microsoft Learn: Branches] The CLI command looks like this:
databricks postgres create-branch --project my-project --name feature-branch --parent production
Branches inherit both schema and data from their parent but share underlying storage through pointers. [Databricks blog: Database Branching in Postgres] Each branch is an independent database environment with its own compute endpoint. [Microsoft Learn: Manage branches]
The new piece in this preview is the GitHub Copilot agent mode integration. You connect Copilot agent mode to the branch endpoint, and the AI agent can query, test, and debug against that isolated copy. [Microsoft Tech Community: Building AI apps and agents] This sits alongside the workspace-wide Genie MCP endpoint that also went into preview, letting Microsoft Copilot Studio agents access Databricks data through a single connection. [Databricks blog: Unifying Data and Governance in the Agentic Era]
How to Try It
- Create a branch from the Lakebase dashboard (Compute, Postgres, Branches tab, click Create Branch) or via the CLI with
databricks postgres create-branch. [Microsoft Learn: CLI for Lakebase] - Set a TTL on your branch so it auto-expires. Branches support time-to-live expiration and there is a 10-unarchived-branch limit per project. [Medium: Lakebase database branching is git-style isolation]
- Grab the branch endpoint from the Lakebase project UI. Each branch gets its own connection string and compute endpoint.
- Connect GitHub Copilot agent mode in VS Code by pointing it at the branch endpoint. If you have the Databricks extension installed, it can handle authentication via your workspace OAuth. [Databricks Community: Connecting VS Code and GitHub Copilot]
- Debug your agent against the branch. Run queries, reproduce failures, test schema changes, all without production risk.
Watch Out For
- The 10-unarchived-branch limit is real. If your team creates branches per PR or per CI run, you will hit it. Branches inactive for an extended period get auto-archived. Plan your TTL and cleanup strategy before rolling this out org-wide. [Microsoft Learn: Manage branches]
- Copilot agent mode needs clear context. It can query your branch, but it still needs to know your schema, your app logic, and what “correct” looks like. Do not expect it to magically debug a data pipeline it has never seen.
- Public preview means rough edges. Authentication flows between Copilot agent mode and Azure Databricks endpoints may require manual OAuth configuration. Check the community threads for workarounds if the default setup fails. [Databricks Community: Connecting VS Code and GitHub Copilot]
- Branch compute costs money. Each branch gets its own compute endpoint. Leaving branches running with active compute will bill you. Set aggressive TTLs for ephemeral debugging branches.
What This Means
The real shift here is not the branching itself. Neon has done copy-on-write database branching for a while, and Databricks’ own blog acknowledges those roots. [Lakebase database branching: a short experiment] What is new is the combination: production-fidelity isolated data, an AI coding agent that can be pointed at it, and the whole thing wired into the Azure-native toolchain your team already uses. The pain point it solves is specific and expensive. Right now, debugging an AI agent against production data means either giving the AI direct access to live state (risky), or working against a stale staging copy that does not reproduce the actual failure (useless). This preview collapses that tradeoff. It also signals where Databricks is heading with its agentic AI story: the database is no longer just a sink for agent outputs. It is becoming a development surface that agents work against directly. That is a meaningful architectural shift, not a feature checkbox.