Posted by

Vivek Janakiraman

Posted on

August 16, 2025

Posted under

Azure Databricks

Comments

Azure Databricks Series: Connect Function App to Export Cluster Configuration via Visual Studio Code

👉 You can also watch this as a YouTube video here: https://www.youtube.com/watch?v=X_z25rh-Ids

In this blog, we will explore how to use Azure Function Apps to connect with selected Azure Databricks workspaces and retrieve the configuration details of available clusters. We’ll use Visual Studio Code as the development environment and configure our Function App with Python scripts to automate this process.

This is a step-by-step guide to exporting cluster configurations in a CSV format for better monitoring and analysis.

🔹 Prerequisites

Before starting, make sure you have:

An active Azure Subscription
Azure Databricks Workspace(s) created
Personal Access Tokens (PATs) generated for each workspace
Visual Studio Code with Azure Functions extension installed
Python environment ready

🔹 Step 1: Setup Local Settings

Add your workspace URLs and PAT tokens in local.settings.json:

"DATABRICKS_WORKSPACE_URLS": "https://adb-26709042233374857.17.azuredatabricks.net,https://adb-1311525322452571.11.azuredatabricks.net,https://adb-32008745334111.11.azuredatabricks.net",
"DATABRICKS_PAT_TOKENS": "dapixxxxxxxxxxxxxxxxxxxxxxxxx,dapiyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy,dapizzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"

⚠️ Ensure the number of URLs and tokens match.

🔹 Step 2: Requirements File

Create a requirements.txt with the following dependencies:

azure-functions
azure-identity
azure-mgmt-resource
requests

This ensures Azure Functions runtime has all required packages.

🔹 Step 3: Python Script

Below is the main Function App script that retrieves Databricks cluster details, flattens the JSON output, and generates a downloadable CSV file:

import logging
import os, logging
import azure.functions as func
import requests
from datetime import datetime

# Databricks credentials from environment variables
app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
workspace_urls = os.environ.get("DATABRICKS_WORKSPACE_URLS", "")
pat_tokens = os.environ.get("DATABRICKS_PAT_TOKENS", "")
WORKSPACES = [url.strip() for url in workspace_urls.split(",") if url.strip()]
PAT_TOKENS = [tok.strip() for tok in pat_tokens.split(",") if tok.strip()]

if len(WORKSPACES) != len(PAT_TOKENS):
    logging.warning("The number of workspace URLs and PAT tokens do not match. Please check app settings.")

@app.route(route="JBadbcostanalysistrigger")
def JBadbcostanalysistrigger(req: func.HttpRequest) -> func.HttpResponse:
    logging.info("✅ Databricks Cluster Filtered Report Triggered")
    if not WORKSPACES:
        return func.HttpResponse("No Databricks workspaces configured.", status_code=400)

    selected_headers = [
        "workspace_url", "cluster_name", "autotermination_minutes", "is_single_node",
        "num_workers", "state", "start_time", "terminated_time", "last_activity_time",
        "termination_reason.code", "termination_reason.parameters", "data_security_mode",
        "driver_healthy", "driver_node_type_id", "effective_spark_version", "node_type_id",
        "release_version", "spark_version"
    ]
    all_rows = []
    for i, workspace_url in enumerate(WORKSPACES):
        token = PAT_TOKENS[i] if i < len(PAT_TOKENS) else PAT_TOKENS[0]
        clusters = list_clusters(workspace_url, token)
        for cluster in clusters:
            flat = flatten_cluster(cluster, workspace_url)
            all_rows.append(flat)

    # Build CSV content
    csv_lines = [",".join(selected_headers)]
    for row in all_rows:
        csv_line = []
        for h in selected_headers:
            value = row.get(h, "")
            if isinstance(value, str):
                value = value.replace('"', '""')
                if ',' in value or '\n' in value:
                    value = f'"{value}"'
            csv_line.append(str(value))
        csv_lines.append(",".join(csv_line))
    csv_output = "\n".join(csv_lines)

    logging.info("✅ Filtered cluster details prepared in CSV format.")
    return func.HttpResponse(csv_output, status_code=200, mimetype="text/csv")

def list_clusters(workspace_url, pat_token):
    api_url = f"{workspace_url.rstrip('/')}/api/2.0/clusters/list"
    headers = {"Authorization": f"Bearer {pat_token}"}
    try:
        res = requests.get(api_url, headers=headers)
    except Exception as e:
        logging.error("HTTP request to %s failed: %s", workspace_url, e)
        return []
    if res.status_code != 200:
        logging.error("Non-200 response from %s: %s %s", workspace_url, res.status_code, res.text)
        return []
    return res.json().get("clusters", [])

def convert_epoch_to_datetime(ms):
    try:
        return datetime.utcfromtimestamp(ms / 1000).strftime('%Y-%m-%d %H:%M:%S')
    except:
        return ms

def flatten_cluster(cluster: dict, workspace_url: str) -> dict:
    flat = {
        "workspace_url": workspace_url,
        "cluster_name": cluster.get("cluster_name", ""),
        "autotermination_minutes": cluster.get("autotermination_minutes", ""),
        "is_single_node": cluster.get("is_single_node", ""),
        "num_workers": cluster.get("num_workers", ""),
        "state": cluster.get("state", ""),
        "start_time": convert_epoch_to_datetime(cluster.get("start_time", "")),
        "terminated_time": convert_epoch_to_datetime(cluster.get("terminated_time", "")),
        "last_activity_time": convert_epoch_to_datetime(cluster.get("last_activity_time", "")),
        "termination_reason.code": cluster.get("termination_reason", {}).get("code", ""),
        "termination_reason.parameters": cluster.get("termination_reason", {}).get("parameters", ""),
        "data_security_mode": cluster.get("data_security_mode", ""),
        "driver_healthy": cluster.get("driver_healthy", ""),
        "driver_node_type_id": cluster.get("driver_node_type_id", ""),
        "effective_spark_version": cluster.get("effective_spark_version", ""),
        "node_type_id": cluster.get("node_type_id", ""),
        "release_version": cluster.get("release_version", ""),
        "spark_version": cluster.get("spark_version", "")
    }
    return flat

🔹 Step 4: Deploy and Test

Deploy the Function App to Azure.
Trigger the HTTP endpoint /JBadbcostanalysistrigger.
A CSV file will be returned containing all Databricks cluster configurations from the selected workspaces.

🎯 Conclusion

In this blog, we demonstrated how to:

Connect Azure Function App to multiple Databricks Workspaces
Retrieve cluster configurations via Databricks REST API
Export the details into a CSV for analysis

This approach helps automate cluster monitoring and cost analysis across multiple workspaces efficiently.

👉 Don’t forget to check the full Azure Databricks Series playlist for step-by-step tutorials:
https://www.youtube.com/playlist?list=PLNj2XeCNjFeosTuxZLjfYvnW4H1hsPH07

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted by

Vivek Janakiraman

Posted on

August 16, 2025

Posted under

Azure Databricks

Comments

Azure Databricks Series: Connect Function App to Workspace & Export Cluster Config via Azure Portal

📺 You can also watch this as a YouTube video here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0

🔹 Introduction

In this blog, we’ll walk through how to use Azure Function Apps to connect to selected Azure Databricks workspaces and export the configuration details of available clusters.

This is especially useful for:

Auditing cluster usage
Tracking configuration changes
Exporting cluster details for compliance and reporting

We’ll be using:

Azure Portal to configure the Function App
Python script to call Databricks REST APIs
Environment variables to manage credentials securely

🔹 Prerequisites

Before we begin, ensure you have:
✔️ Access to an Azure subscription
✔️ One or more Azure Databricks workspaces with valid PAT tokens
✔️ A basic understanding of Azure Function Apps
✔️ Python installed locally if you want to test before deployment

🔹 Step 1: Create a Function App in Azure Portal

Go to the Azure Portal
Create a Function App with:
- Runtime stack: Python
- Version: 3.11
- Hosting: Consumption Plan or Premium Plan (based on needs)
Deploy and wait for the Function App to be ready.

🔹 Step 2: Configure Environment Variables

Inside your Function App → Configuration → Application settings, add the following:

DATABRICKS_WORKSPACE_URLS

https://adb-2670904240043557.17.azuredatabricks.net,https://adb-1311525333242571.11.azuredatabricks.net,https://adb-320087374534111.11.azuredatabricks.net

DATABRICKS_PAT_TOKENS

dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxx,dapiyyyyyyyyyyyyyyyyyyyyyyyyy,dapizzzzzzzzzzzzzzzzzzzzzzzzzzzz

These variables will allow the function to securely authenticate with Databricks workspaces.

🔹 Step 3: Add Dependencies

In your Function App project, create a requirements.txt file with:

azure-functions
azure-identity
azure-mgmt-resource
requests

This ensures your function has the right libraries to run.

🔹 Step 4: Function Definition

Inside your project, create the function.json to define HTTP trigger bindings:

{
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": ["get"]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ]
}

This makes the function accessible via HTTP GET requests.

🔹 Step 5: Python Code to Retrieve Cluster Configurations

Now, add the following Python script to your Function App:

import os, logging
import azure.functions as func
import requests
from datetime import datetime

# Databricks credentials from environment variables
workspace_urls = os.environ.get("DATABRICKS_WORKSPACE_URLS", "")
pat_tokens = os.environ.get("DATABRICKS_PAT_TOKENS", "")
WORKSPACES = [url.strip() for url in workspace_urls.split(",") if url.strip()]
PAT_TOKENS = [tok.strip() for tok in pat_tokens.split(",") if tok.strip()]

if len(WORKSPACES) != len(PAT_TOKENS):
    logging.warning("The number of workspace URLs and PAT tokens do not match. Please check app settings.")

def list_clusters(workspace_url, pat_token):
    api_url = f"{workspace_url.rstrip('/')}/api/2.0/clusters/list"
    headers = {"Authorization": f"Bearer {pat_token}"}
    try:
        res = requests.get(api_url, headers=headers)
    except Exception as e:
        logging.error("HTTP request to %s failed: %s", workspace_url, e)
        return []
    if res.status_code != 200:
        logging.error("Non-200 response from %s: %s %s", workspace_url, res.status_code, res.text)
        return []
    return res.json().get("clusters", [])

def convert_epoch_to_datetime(ms):
    try:
        return datetime.utcfromtimestamp(ms / 1000).strftime('%Y-%m-%d %H:%M:%S')
    except:
        return ms

def flatten_cluster(cluster: dict, workspace_url: str) -> dict:
    flat = {
        "workspace_url": workspace_url,
        "cluster_name": cluster.get("cluster_name", ""),
        "autotermination_minutes": cluster.get("autotermination_minutes", ""),
        "is_single_node": cluster.get("is_single_node", ""),
        "num_workers": cluster.get("num_workers", ""),
        "state": cluster.get("state", ""),
        "start_time": convert_epoch_to_datetime(cluster.get("start_time", "")),
        "terminated_time": convert_epoch_to_datetime(cluster.get("terminated_time", "")),
        "last_activity_time": convert_epoch_to_datetime(cluster.get("last_activity_time", "")),
        "termination_reason.code": cluster.get("termination_reason", {}).get("code", ""),
        "termination_reason.parameters": cluster.get("termination_reason", {}).get("parameters", ""),
        "data_security_mode": cluster.get("data_security_mode", ""),
        "driver_healthy": cluster.get("driver_healthy", ""),
        "driver_node_type_id": cluster.get("driver_node_type_id", ""),
        "effective_spark_version": cluster.get("effective_spark_version", ""),
        "node_type_id": cluster.get("node_type_id", ""),
        "release_version": cluster.get("release_version", ""),
        "spark_version": cluster.get("spark_version", "")
    }
    return flat

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info("✅ Databricks Cluster Filtered Report Triggered")

    if not WORKSPACES:
        return func.HttpResponse("No Databricks workspaces configured.", status_code=400)

    selected_headers = [
        "workspace_url", "cluster_name", "autotermination_minutes", "is_single_node",
        "num_workers", "state", "start_time", "terminated_time", "last_activity_time",
        "termination_reason.code", "termination_reason.parameters", "data_security_mode",
        "driver_healthy", "driver_node_type_id", "effective_spark_version", "node_type_id",
        "release_version", "spark_version"
    ]

    all_rows = []
    for i, workspace_url in enumerate(WORKSPACES):
        token = PAT_TOKENS[i] if i < len(PAT_TOKENS) else PAT_TOKENS[0]
        clusters = list_clusters(workspace_url, token)
        for cluster in clusters:
            flat = flatten_cluster(cluster, workspace_url)
            all_rows.append(flat)

    # Build CSV content
    csv_lines = [",".join(selected_headers)]
    for row in all_rows:
        csv_line = []
        for h in selected_headers:
            value = row.get(h, "")
            if isinstance(value, str):
                value = value.replace('"', '""')
                if ',' in value or '\n' in value:
                    value = f'"{value}"'
            csv_line.append(str(value))
        csv_lines.append(",".join(csv_line))

    csv_output = "\n".join(csv_lines)
    logging.info("✅ Filtered cluster details prepared in CSV format.")

    return func.HttpResponse(csv_output, status_code=200, mimetype="text/csv")

🔹 Step 6: Test the Function

Deploy your Function App from Azure Portal
Copy the function URL
Open a browser or Postman → Send a GET request
You’ll get a CSV output containing cluster details across all configured workspaces 🎉

🔹 Conclusion

With this setup, you’ve automated the process of retrieving cluster configurations from multiple Azure Databricks workspaces. This makes it easy to:
✔️ Export data for audits
✔️ Track usage patterns
✔️ Maintain compliance records

You can further enhance this by storing CSVs in Azure Blob Storage or sending outputs to Power BI for dashboards.

📺 Don’t forget to check out the full video walkthrough here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0

Thank You,
Vivek Janakiraman

Posted by

Vivek Janakiraman

Posted on

July 25, 2025

Posted under

Azure, Power BI

Comments

Azure Series: Control Power BI Report Viewing in Power Apps Using a Time Window

We’ve created a step-by-step walkthrough video that demonstrates the entire setup in real-time.
👉 Watch it on YouTube – JBSWiki Channel

Controlling Power BI report access based on time is an essential feature for organizations with time-bound reporting requirements, compliance restrictions, or business-specific policies. In this blog post, we will walk through a real-world use case where we use Power Apps to embed a Power BI report and restrict its access based on a specific time window (9:00 AM to 1:00 PM).

This solution is simple, clean, and effective—and uses only native capabilities in Power Apps, without needing advanced permissions or complex security configurations.

Let’s dive in!

✅ Why Time-Based Access?

There are several scenarios where restricting Power BI report access by time makes sense:

📅 Business Hours Only Access: You want employees to view reports only during operational hours.
🔐 Security Compliance: You need to control access windows for sensitive reports.
📊 Scheduled Refresh Windows: Prevent access during report refresh to avoid outdated data usage.

Rather than configuring complex access rules at the Power BI service level, Power Apps gives you flexibility to embed the report conditionally, based on the system time and logic defined in your app.

🧰 What You’ll Need

Microsoft Power Apps license
A published Power BI report
The iFrame URL of the Power BI report (embed link)
Basic familiarity with Power FX, HTML Text Control, and variables in Power Apps

🔧 Step-by-Step Implementation

1. 🎯 Define the Time-Based Logic

We’ll use a button in Power Apps that sets the current time into variables and checks if the time falls between 9:00 AM and 1:00 PM. If yes, it launches the Power BI report; otherwise, it shows an Access Denied notification.

🔹 Code for the Button in Power Apps:

Set(varNow, Now());
Set(varHour, Hour(varNow));
Set(varMinute, Minute(varNow));
Set(varCurrentTime, Time(varHour, varMinute, 0));
Set(varAccessAllowed, varCurrentTime >= Time(9, 0, 0) && varCurrentTime < Time(13, 0, 0));
If(
    varAccessAllowed,
    Launch("https://app.powerbi.com/reportEmbed?reportId=fc00e03c-ec12-4038-af5a-2bfb9972f70b&autoAuth=true"),
    Notify("Access denied. You are allowed to view this report between 9:00 - 13:00.", NotificationType.Error)
)

💡 Explanation:

varNow: Captures the current system timestamp.
varHour & varMinute: Extracts the hour and minute.
varCurrentTime: Combines hour and minute into a Time object.
varAccessAllowed: Boolean flag based on whether the current time is between 9:00 AM and 1:00 PM.
If(...): Launches the Power BI report if access is allowed; otherwise, shows an error notification.

2. 📋 Display Time and Access Status

Add a Text Label to show the current time and whether access is allowed. This helps with testing and gives users clarity.

🔹 Code for the Text Label:

"Now: " & Text(Now(), "[$-en-US]HH:mm:ss") & 
" | Allowed: " & Text(varAccessAllowed)

This will dynamically update and display:

Current time
Whether access is currently allowed

Example output:
Now: 10:45:23 | Allowed: true

🌐 Real-World Use Cases

Here are a few scenarios where this logic is useful:

Scenario	Benefit
📈 Internal Dashboards	Restrict to working hours only
🏦 Banking/Finance Apps	Limit sensitive data views during authorized hours
🧾 Compliance & Audits	Enforce strict access rules using app logic
🏢 Client-Specific Reports	Tailor access hours based on agreements

🚀 Why Use Power Apps for This?

✅ Flexible Control: No need to alter Power BI permissions
✅ Low-Code Friendly: Easy to maintain and update
✅ Secure Embedding: iFrame loads only after validation
✅ User Feedback: Display live status and messages using simple UI elements

📌 Tips and Best Practices

🕐 Always use 24-hour format when working with Time() in Power FX
🔒 Do not expose sensitive logic in client-side apps unless supported by security measures
🔁 Consider refreshing the time variables periodically if embedding directly in app screens
🖼️ Use a Gallery or Container if embedding multiple reports with different rules

🎥 Watch the Video

We’ve created a step-by-step walkthrough video that demonstrates the entire setup in real-time.
👉 Watch it on YouTube – JBSWiki Channel

🏁 Conclusion

Using Power Apps + Power BI, you can securely control report visibility based on custom business logic. This time-based control pattern is just one of many ways to extend app capabilities using simple Power FX expressions.

With this setup:

Your users see reports only when they’re supposed to
Your data remains protected
Your app feels responsive and intelligent

Thank You,
Vivek Janakiraman