📺 You can also watch this as a YouTube video here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0
🔹 Introduction
In this blog, we’ll walk through how to use Azure Function Apps to connect to selected Azure Databricks workspaces and export the configuration details of available clusters.
This is especially useful for:
- Auditing cluster usage
- Tracking configuration changes
- Exporting cluster details for compliance and reporting
We’ll be using:
- Azure Portal to configure the Function App
- Python script to call Databricks REST APIs
- Environment variables to manage credentials securely
🔹 Prerequisites
Before we begin, ensure you have:
✔️ Access to an Azure subscription
✔️ One or more Azure Databricks workspaces with valid PAT tokens
✔️ A basic understanding of Azure Function Apps
✔️ Python installed locally if you want to test before deployment
🔹 Step 1: Create a Function App in Azure Portal
- Go to the Azure Portal
- Create a Function App with:
- Runtime stack: Python
- Version: 3.11
- Hosting: Consumption Plan or Premium Plan (based on needs)
- Deploy and wait for the Function App to be ready.
🔹 Step 2: Configure Environment Variables
Inside your Function App → Configuration → Application settings, add the following:
DATABRICKS_WORKSPACE_URLS
https://adb-2670904240043557.17.azuredatabricks.net,https://adb-1311525333242571.11.azuredatabricks.net,https://adb-320087374534111.11.azuredatabricks.net
DATABRICKS_PAT_TOKENS
dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxx,dapiyyyyyyyyyyyyyyyyyyyyyyyyy,dapizzzzzzzzzzzzzzzzzzzzzzzzzzzz
These variables will allow the function to securely authenticate with Databricks workspaces.
🔹 Step 3: Add Dependencies
In your Function App project, create a requirements.txt file with:
azure-functions
azure-identity
azure-mgmt-resource
requests
This ensures your function has the right libraries to run.
🔹 Step 4: Function Definition
Inside your project, create the function.json to define HTTP trigger bindings:
{
"bindings": [
{
"authLevel": "function",
"type": "httpTrigger",
"direction": "in",
"name": "req",
"methods": ["get"]
},
{
"type": "http",
"direction": "out",
"name": "$return"
}
]
}
This makes the function accessible via HTTP GET requests.
🔹 Step 5: Python Code to Retrieve Cluster Configurations
Now, add the following Python script to your Function App:
import os, logging
import azure.functions as func
import requests
from datetime import datetime
# Databricks credentials from environment variables
workspace_urls = os.environ.get("DATABRICKS_WORKSPACE_URLS", "")
pat_tokens = os.environ.get("DATABRICKS_PAT_TOKENS", "")
WORKSPACES = [url.strip() for url in workspace_urls.split(",") if url.strip()]
PAT_TOKENS = [tok.strip() for tok in pat_tokens.split(",") if tok.strip()]
if len(WORKSPACES) != len(PAT_TOKENS):
logging.warning("The number of workspace URLs and PAT tokens do not match. Please check app settings.")
def list_clusters(workspace_url, pat_token):
api_url = f"{workspace_url.rstrip('/')}/api/2.0/clusters/list"
headers = {"Authorization": f"Bearer {pat_token}"}
try:
res = requests.get(api_url, headers=headers)
except Exception as e:
logging.error("HTTP request to %s failed: %s", workspace_url, e)
return []
if res.status_code != 200:
logging.error("Non-200 response from %s: %s %s", workspace_url, res.status_code, res.text)
return []
return res.json().get("clusters", [])
def convert_epoch_to_datetime(ms):
try:
return datetime.utcfromtimestamp(ms / 1000).strftime('%Y-%m-%d %H:%M:%S')
except:
return ms
def flatten_cluster(cluster: dict, workspace_url: str) -> dict:
flat = {
"workspace_url": workspace_url,
"cluster_name": cluster.get("cluster_name", ""),
"autotermination_minutes": cluster.get("autotermination_minutes", ""),
"is_single_node": cluster.get("is_single_node", ""),
"num_workers": cluster.get("num_workers", ""),
"state": cluster.get("state", ""),
"start_time": convert_epoch_to_datetime(cluster.get("start_time", "")),
"terminated_time": convert_epoch_to_datetime(cluster.get("terminated_time", "")),
"last_activity_time": convert_epoch_to_datetime(cluster.get("last_activity_time", "")),
"termination_reason.code": cluster.get("termination_reason", {}).get("code", ""),
"termination_reason.parameters": cluster.get("termination_reason", {}).get("parameters", ""),
"data_security_mode": cluster.get("data_security_mode", ""),
"driver_healthy": cluster.get("driver_healthy", ""),
"driver_node_type_id": cluster.get("driver_node_type_id", ""),
"effective_spark_version": cluster.get("effective_spark_version", ""),
"node_type_id": cluster.get("node_type_id", ""),
"release_version": cluster.get("release_version", ""),
"spark_version": cluster.get("spark_version", "")
}
return flat
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info("✅ Databricks Cluster Filtered Report Triggered")
if not WORKSPACES:
return func.HttpResponse("No Databricks workspaces configured.", status_code=400)
selected_headers = [
"workspace_url", "cluster_name", "autotermination_minutes", "is_single_node",
"num_workers", "state", "start_time", "terminated_time", "last_activity_time",
"termination_reason.code", "termination_reason.parameters", "data_security_mode",
"driver_healthy", "driver_node_type_id", "effective_spark_version", "node_type_id",
"release_version", "spark_version"
]
all_rows = []
for i, workspace_url in enumerate(WORKSPACES):
token = PAT_TOKENS[i] if i < len(PAT_TOKENS) else PAT_TOKENS[0]
clusters = list_clusters(workspace_url, token)
for cluster in clusters:
flat = flatten_cluster(cluster, workspace_url)
all_rows.append(flat)
# Build CSV content
csv_lines = [",".join(selected_headers)]
for row in all_rows:
csv_line = []
for h in selected_headers:
value = row.get(h, "")
if isinstance(value, str):
value = value.replace('"', '""')
if ',' in value or '\n' in value:
value = f'"{value}"'
csv_line.append(str(value))
csv_lines.append(",".join(csv_line))
csv_output = "\n".join(csv_lines)
logging.info("✅ Filtered cluster details prepared in CSV format.")
return func.HttpResponse(csv_output, status_code=200, mimetype="text/csv")
🔹 Step 6: Test the Function
- Deploy your Function App from Azure Portal
- Copy the function URL
- Open a browser or Postman → Send a GET request
- You’ll get a CSV output containing cluster details across all configured workspaces 🎉
🔹 Conclusion
With this setup, you’ve automated the process of retrieving cluster configurations from multiple Azure Databricks workspaces. This makes it easy to:
✔️ Export data for audits
✔️ Track usage patterns
✔️ Maintain compliance records
You can further enhance this by storing CSVs in Azure Blob Storage or sending outputs to Power BI for dashboards.
📺 Don’t forget to check out the full video walkthrough here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0
Thank You,
Vivek Janakiraman
Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.
- Azure Databricks configuration guide
- Azure Databricks for beginners
- Azure Databricks function app
- Azure Databricks hands-on
- Azure Databricks playlist
- Azure Databricks practical guide
- Azure Databricks real world use case
- Azure Databricks step by step
- azure databricks tutorial
- Connect Azure Function App to Databricks
- Databricks admin tutorial
- Databricks automation tutorial
- Databricks automation using VS Code
- Databricks best practices
- Databricks cluster configuration tutorial
- Databricks cluster management
- Databricks cluster setup
- Databricks developer tutorial
- Databricks step by step tutorial
- Databricks Visual Studio Code integration
- Databricks with Azure Functions
- Databricks workspace integration
- Export cluster config in Databricks
- Export Databricks cluster configuration
- Learn Azure Databricks