Add Databricks SQL Warehouse Support to Golang Migrate #1167

caldempsey · 2024-09-16T16:34:10Z

Currently, Databricks does not offer a built-in tool for deterministic schema migrations between Delta Table schemas. While schema evolution tools are available for managing changes in Delta Lake, they do not provide a controlled, additive approach to schema modifications. Given a need for precise schema management when transforming unstructured data into highly structured data within Delta Lake, a more controlled migration strategy is essential.

This PR introduces support for Databricks SQL Warehouse. This enhancement allows for precise and controlled schema management through Unity Catalog, facilitating seamless integration with both internal and external tables, such as Delta Lake or Iceberg tables. If you plan to use this, please review the Known Issues section, as there are some quirks in the implementation that need to be addressed.

Implementation Details:

SQL Warehouse Support: Enables schema migrations and version management within Databricks environments using SQL Warehouse.
Databricks CLI Integration: The implementation currently uses the Databricks CLI agent for connectivity. Future releases may extend support to ODBC or JDBC connections, broadening connectivity options and facilitating integration with Apache Hive through JDBC or ODBC drivers.
Migrations Management: Handles migration operations, including running migrations from input streams, setting and retrieving version information, and ensuring the migrations table exists.
Table Management: Provides functionality for dropping tables and creating the migrations table if it does not already exist.

Usage:

migrate -source file://blah -database databricks-sqlwarehouse://token:{{token}}@{{workspace_id}}.cloud.databricks.com:443/sql/1.0/warehouses/{{warehouse_id}} {{arg}}

Known Issues:

This implementation was developed quickly to address immediate needs for a controlled schema migration process. It may not handle all edge cases perfectly. The primary challenges include:

Error Handling: Error messages can be unclear when dealing with dirty migrations. While the migration process generally works, you may encounter issues with error reporting.
Transactions: Databricks SQL Driver does not yet support transactions. As a result, concurrent operations are not advised.
Parameters: The Databricks SQL Driver does not support parameters. Instead, creating the migrations table and other internal operations rely on direct SQL injections.
Catalog Requirements: The hive_metastore catalog must exist in Unity Catalog (UC) and the default schema, as the migrations table will be stored there. This may be configurable in future versions.
Multiple Queries: The driver does not handle multiple SQL queries in a single request well. To avoid issues, split multiple table creations into separate migrations.
Version Management: The SQL Migrate tool requires manual intervention if it prompts for a forced version change. Apply migrations after manually adjusting the database version.
SQL Syntax: The Databricks SQL Driver is stricter with SQL syntax compared to the notebooks. Ensure SQL queries are accurate and test them in an environment that closely mirrors the driver’s requirements.

Disclaimer: The author accepts no responsibility for any damage to personal or business systems, databases, networks, device drivers, or any other components resulting from the use of this driver.

…ks-sqlwarehouse://

caldempsey · 2024-09-16T16:37:38Z

database/databricks/sql_warehouse.go

+var (
+	multiStmtDelimiter = []byte(";")
+
+	DefaultMigrationsTable       = "schema_migrations"


As in my notes, we could and probably should point this at catalog_name.schema_name.schema_migrations before merging.

caldempsey · 2024-09-17T21:12:24Z

database/databricks/sql_warehouse.go

+	return database.CasRestoreOnErr(&d.isLocked, false, true, database.ErrLocked, func() error {
+		// Databricks SQL Warehouse does not support locking
+		// Placeholder for actual lock code
+		return nil


the SQL Warehouse might but the database driver hasn't implemented it

caldempsey · 2024-09-18T23:11:21Z

database/databricks/examples/20240906131313_create_cat_naps_table.up.sql

+CREATE EXTERNAL TABLE IF NOT EXISTS `dog-park-db`.default.cat_naps (
+    nap_id            STRING NOT NULL,    -- id of the nap
+    nap_location      STRING NOT NULL,    -- location where the nap took place
+    checkpoint_id LONG NOT NULL,          -- ID given to the batch per checkpoint, assigned to many process runs.
+    batch_id    STRING NOT NULL,          -- ID given to each independent batch
+    recorded_at        TIMESTAMP NOT NULL -- Timestamp indicating when the nap was recorded.
+) LOCATION 's3://dog-park-db-tables/cat_naps';


Just wrote a migration that does this, should add one:

ALTER TABLE `dog-park-db`.default.cat_naps ADD COLUMNS ( md5 STRING COMMENT 'MD5 checksum of the file content' );

caldempsey added 3 commits September 6, 2024 11:44

feat: base implementation for Databricks SQL Warehouse using databric…

ba3a555

…ks-sqlwarehouse://

feat: add build_databricks.go + Makefile reqs

ca7accf

fix: better examples

4ea42cd

caldempsey commented Sep 16, 2024

View reviewed changes

fix: better examples (again)

1f158c2

caldempsey commented Sep 17, 2024

View reviewed changes

caldempsey commented Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Databricks SQL Warehouse Support to Golang Migrate #1167

Add Databricks SQL Warehouse Support to Golang Migrate #1167

caldempsey commented Sep 16, 2024

caldempsey Sep 16, 2024

caldempsey Sep 17, 2024

caldempsey Sep 18, 2024 •

edited

Loading

Add Databricks SQL Warehouse Support to Golang Migrate #1167

Are you sure you want to change the base?

Add Databricks SQL Warehouse Support to Golang Migrate #1167

Conversation

caldempsey commented Sep 16, 2024

caldempsey Sep 16, 2024

Choose a reason for hiding this comment

caldempsey Sep 17, 2024

Choose a reason for hiding this comment

caldempsey Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

caldempsey Sep 18, 2024 •

edited

Loading