Serverless Compute

When working with Azure Databricks, it’s often necessary to programmatically retrieve and manage metadata about your compute resources. One such resource is the Serverless SQL Warehouse, designed for cost-effective and scalable interactive analytics.

In this blog, we’ll walk through a step-by-step Python script that helps you list all Serverless SQL Warehouses in your Databricks workspace using the REST API and persist the results in a Delta table for further analysis.

🔍 Why Monitor Serverless SQL Warehouses?

Serverless SQL Warehouses are a key part of many organizations’ data strategies due to:

Auto-scaling capabilities
No infrastructure management
Pay-per-use pricing model

By tracking your serverless SQL endpoints, you can gain insights into:

Who created them
Their sizes and memory footprints
Auto-stop configurations
Their current state (running/stopped)

🛠️ Solution Overview

We’ll use the Databricks SQL Endpoints API to get the list of all SQL Warehouses, filter out only the serverless ones, enrich the data with approximate memory, and save it into a Delta table using PySpark.

🧪 Code Walkthrough

import requests
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

instance = "https://adb-1311537494242340.26.azuredatabricks.net/"
token = "dapia***************************"

headers = {
    "Authorization": f"Bearer {token}"
}

url = f"{instance}/api/2.0/sql/endpoints"
response = requests.get(url, headers=headers)
data = response.json()

def size_to_memory(cluster_size):
    mapping = {
        "2X-Small": "64 GB",
        "X-Small": "128 GB",
        "Small": "256 GB",
        "Medium": "512 GB",
        "Large": "1 TB",
        "X-Large": "2 TB",
        "2X-Large": "4 TB",
        "3X-Large": "8 TB",
        "4X-Large": "16 TB"
    }
    return mapping.get(cluster_size, "Unknown")

# Prepare the data
records = []
for endpoint in data.get("endpoints", []):
    if endpoint.get("enable_serverless_compute", False):
        records.append({
            "name": endpoint["name"],
            "id": endpoint["id"],
            "cluster_size": endpoint["cluster_size"],
            "approx_memory": size_to_memory(endpoint["cluster_size"]),
            "auto_stop_mins": endpoint["auto_stop_mins"],
            "creator": endpoint["creator_name"],
            "state": endpoint["state"]
        })
# Create Spark DataFrame
df = spark.createDataFrame(records)

# Save to Delta (overwrite or append as needed)
df.write.format("delta").mode("overwrite").saveAsTable("default.serverless_sql_warehouses")

display(spark.table("default.serverless_sql_warehouses"))

✅ Sample Output

💡 Pro Tips

🔐 Never hardcode tokens in production scripts. Use Azure Key Vault or Databricks secrets to securely manage secrets.
🛑 Consider implementing pagination if your workspace has many warehouses.
📊 Use this Delta table as a source for monitoring dashboards in Power BI or Databricks SQL.

📚 Conclusion

With just a few lines of code, you can automate the discovery of all serverless SQL warehouses, store their metadata in a Delta Lake, and use it for reporting, auditing, or monitoring purposes. This is particularly useful in large-scale environments where managing SQL compute efficiently is crucial.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

JBs Wiki

Tag Archives: Serverless Compute

Azure Databricks Series: Displaying All Serverless SQL Warehouses in Your Workspace

🔍 Why Monitor Serverless SQL Warehouses?

🛠️ Solution Overview

🧪 Code Walkthrough

💡 Pro Tips

📚 Conclusion