Azure Databricks Series: Connect Function App to Workspace & Export Cluster Config via Azure Portal

📺 You can also watch this as a YouTube video here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0


🔹 Introduction

In this blog, we’ll walk through how to use Azure Function Apps to connect to selected Azure Databricks workspaces and export the configuration details of available clusters.

This is especially useful for:

  • Auditing cluster usage
  • Tracking configuration changes
  • Exporting cluster details for compliance and reporting

We’ll be using:

  • Azure Portal to configure the Function App
  • Python script to call Databricks REST APIs
  • Environment variables to manage credentials securely

🔹 Prerequisites

Before we begin, ensure you have:
✔️ Access to an Azure subscription
✔️ One or more Azure Databricks workspaces with valid PAT tokens
✔️ A basic understanding of Azure Function Apps
✔️ Python installed locally if you want to test before deployment


🔹 Step 1: Create a Function App in Azure Portal

  1. Go to the Azure Portal
  2. Create a Function App with:
    • Runtime stack: Python
    • Version: 3.11
    • Hosting: Consumption Plan or Premium Plan (based on needs)
  3. Deploy and wait for the Function App to be ready.

🔹 Step 2: Configure Environment Variables

Inside your Function App → Configuration → Application settings, add the following:

DATABRICKS_WORKSPACE_URLS

https://adb-2670904240043557.17.azuredatabricks.net,https://adb-1311525333242571.11.azuredatabricks.net,https://adb-320087374534111.11.azuredatabricks.net

DATABRICKS_PAT_TOKENS

dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxx,dapiyyyyyyyyyyyyyyyyyyyyyyyyy,dapizzzzzzzzzzzzzzzzzzzzzzzzzzzz

These variables will allow the function to securely authenticate with Databricks workspaces.


🔹 Step 3: Add Dependencies

In your Function App project, create a requirements.txt file with:

azure-functions
azure-identity
azure-mgmt-resource
requests

This ensures your function has the right libraries to run.


🔹 Step 4: Function Definition

Inside your project, create the function.json to define HTTP trigger bindings:

{
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": ["get"]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ]
}

This makes the function accessible via HTTP GET requests.


🔹 Step 5: Python Code to Retrieve Cluster Configurations

Now, add the following Python script to your Function App:

import os, logging
import azure.functions as func
import requests
from datetime import datetime

# Databricks credentials from environment variables
workspace_urls = os.environ.get("DATABRICKS_WORKSPACE_URLS", "")
pat_tokens = os.environ.get("DATABRICKS_PAT_TOKENS", "")
WORKSPACES = [url.strip() for url in workspace_urls.split(",") if url.strip()]
PAT_TOKENS = [tok.strip() for tok in pat_tokens.split(",") if tok.strip()]

if len(WORKSPACES) != len(PAT_TOKENS):
    logging.warning("The number of workspace URLs and PAT tokens do not match. Please check app settings.")

def list_clusters(workspace_url, pat_token):
    api_url = f"{workspace_url.rstrip('/')}/api/2.0/clusters/list"
    headers = {"Authorization": f"Bearer {pat_token}"}
    try:
        res = requests.get(api_url, headers=headers)
    except Exception as e:
        logging.error("HTTP request to %s failed: %s", workspace_url, e)
        return []
    if res.status_code != 200:
        logging.error("Non-200 response from %s: %s %s", workspace_url, res.status_code, res.text)
        return []
    return res.json().get("clusters", [])

def convert_epoch_to_datetime(ms):
    try:
        return datetime.utcfromtimestamp(ms / 1000).strftime('%Y-%m-%d %H:%M:%S')
    except:
        return ms

def flatten_cluster(cluster: dict, workspace_url: str) -> dict:
    flat = {
        "workspace_url": workspace_url,
        "cluster_name": cluster.get("cluster_name", ""),
        "autotermination_minutes": cluster.get("autotermination_minutes", ""),
        "is_single_node": cluster.get("is_single_node", ""),
        "num_workers": cluster.get("num_workers", ""),
        "state": cluster.get("state", ""),
        "start_time": convert_epoch_to_datetime(cluster.get("start_time", "")),
        "terminated_time": convert_epoch_to_datetime(cluster.get("terminated_time", "")),
        "last_activity_time": convert_epoch_to_datetime(cluster.get("last_activity_time", "")),
        "termination_reason.code": cluster.get("termination_reason", {}).get("code", ""),
        "termination_reason.parameters": cluster.get("termination_reason", {}).get("parameters", ""),
        "data_security_mode": cluster.get("data_security_mode", ""),
        "driver_healthy": cluster.get("driver_healthy", ""),
        "driver_node_type_id": cluster.get("driver_node_type_id", ""),
        "effective_spark_version": cluster.get("effective_spark_version", ""),
        "node_type_id": cluster.get("node_type_id", ""),
        "release_version": cluster.get("release_version", ""),
        "spark_version": cluster.get("spark_version", "")
    }
    return flat

def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info("✅ Databricks Cluster Filtered Report Triggered")

    if not WORKSPACES:
        return func.HttpResponse("No Databricks workspaces configured.", status_code=400)

    selected_headers = [
        "workspace_url", "cluster_name", "autotermination_minutes", "is_single_node",
        "num_workers", "state", "start_time", "terminated_time", "last_activity_time",
        "termination_reason.code", "termination_reason.parameters", "data_security_mode",
        "driver_healthy", "driver_node_type_id", "effective_spark_version", "node_type_id",
        "release_version", "spark_version"
    ]

    all_rows = []
    for i, workspace_url in enumerate(WORKSPACES):
        token = PAT_TOKENS[i] if i < len(PAT_TOKENS) else PAT_TOKENS[0]
        clusters = list_clusters(workspace_url, token)
        for cluster in clusters:
            flat = flatten_cluster(cluster, workspace_url)
            all_rows.append(flat)

    # Build CSV content
    csv_lines = [",".join(selected_headers)]
    for row in all_rows:
        csv_line = []
        for h in selected_headers:
            value = row.get(h, "")
            if isinstance(value, str):
                value = value.replace('"', '""')
                if ',' in value or '\n' in value:
                    value = f'"{value}"'
            csv_line.append(str(value))
        csv_lines.append(",".join(csv_line))

    csv_output = "\n".join(csv_lines)
    logging.info("✅ Filtered cluster details prepared in CSV format.")

    return func.HttpResponse(csv_output, status_code=200, mimetype="text/csv")

🔹 Step 6: Test the Function

  • Deploy your Function App from Azure Portal
  • Copy the function URL
  • Open a browser or Postman → Send a GET request
  • You’ll get a CSV output containing cluster details across all configured workspaces 🎉

🔹 Conclusion

With this setup, you’ve automated the process of retrieving cluster configurations from multiple Azure Databricks workspaces. This makes it easy to:
✔️ Export data for audits
✔️ Track usage patterns
✔️ Maintain compliance records

You can further enhance this by storing CSVs in Azure Blob Storage or sending outputs to Power BI for dashboards.

📺 Don’t forget to check out the full video walkthrough here:
👉 https://www.youtube.com/watch?v=4Og3btWBNT0

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Integrating Power BI with Databricks Model Serving in Secure Networks Using Logic Apps and Power Automate

Introduction

Watch this as a video on our you tube channel JBSWiki.

Many enterprises are rapidly adopting Azure Databricks for building machine learning models and serving real-time predictions. However, when strict network security measures are in place—like disabling public network access on Databricks workspaces—it can become incredibly challenging to integrate those models into tools like Power BI.

In this blog, we’ll explore how to securely call a Databricks Model Serving endpoint from Power BI under the scenario where:

  • The Databricks workspace has Allow Public Network Access = Disabled
  • Any direct call from Power Automate to Databricks fails with a 403 Unauthorized network access error

We’ll overcome this limitation using Logic Apps running inside a Virtual Network (VNet) and acting as a secure bridge between Power BI and Databricks.

Let’s dive in! 🔍


The Challenge: Network Restrictions and 403 Errors

By default, services like Power Automate send traffic over the public internet. If your Databricks workspace is configured with Allow Public Network Access disabled, any direct HTTP request to its REST APIs from Power Automate will fail.

The result is a 403 Unauthorized network access to workspace error.

This happens because Databricks:

  • Blocks all public network traffic
  • Only allows communication from services or VNets that are directly peered or integrated

In highly secure enterprise environments, keeping Databricks private is essential. But it poses a problem:

How can Power BI users trigger predictions from Databricks ML models if public access is disabled?


The Solution: Introducing Logic Apps as a Secure Proxy

Instead of connecting Power Automate directly to Databricks, we introduce Logic Apps running inside an Azure VNet.

Logic Apps can:

✅ Connect to Databricks Model Serving endpoints privately through peered VNets or private endpoints
✅ Expose an HTTP endpoint that Power Automate can call publicly
✅ Act as a secure proxy, handling all authentication and network routing

This architecture ensures:

  • Network security compliance
  • Seamless integration between Power BI and Databricks
  • Avoidance of 403 errors

Let’s walk through the full solution step by step. 🚀


Solution Architecture

Here’s how the integration flows:

  1. User clicks a button in Power BI ➡ triggers Power Automate.
  2. Power Automate ➡ sends an HTTP POST request to Logic Apps.
  3. Logic Apps ➡ securely calls the Databricks Model Serving endpoint within the VNet.
  4. Databricks Model Serving ➡ returns prediction results to Logic Apps.
  5. Logic Apps ➡ sends the response back to Power Automate.
  6. Power Automate ➡ updates Power BI visuals or datasets with prediction results.

This ensures Databricks never exposes its endpoints publicly, yet Power BI can still retrieve real-time predictions.


Step 1 — Create Databricks Model Serving Endpoint

First, make sure you’ve deployed your machine learning model to a Databricks Model Serving endpoint.

For this blog, let’s assume you’ve published an endpoint like:

https://adb-1311343844234579.11.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations

This endpoint:

  • Requires authentication via a Databricks PAT (Personal Access Token) or Azure AD token.
  • Accepts JSON requests.
  • Returns prediction results in JSON format.

Remember, because public network access is disabled, only resources inside your VNet—or peered VNets—can reach this endpoint.


Step 2 — Create Logic Apps in VNet

Next, deploy a Logic App Standard into a VNet.

Benefits:

  • Can communicate privately with Databricks.
  • Supports secure inbound and outbound traffic.
  • Scales to enterprise workloads.

Create an HTTP Trigger

Configure Logic Apps to start on an HTTP request.

Request Body JSON Schema for our scenario looks like this:

{
  "type": "object",
  "properties": {
    "inputs": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "Date": {
            "type": "string"
          },
          "OPEN": {
            "type": "integer"
          },
          "HIGH": {
            "type": "integer"
          },
          "LOW": {
            "type": "integer"
          },
          "CLOSE": {
            "type": "integer"
          }
        },
        "required": [
          "Date",
          "OPEN",
          "HIGH",
          "LOW",
          "CLOSE"
        ]
      }
    }
  }
}

his defines the expected JSON payload your Logic App will receive from Power Automate.


Step 3 — Logic Apps: Call Databricks Model Serving

Inside Logic Apps, add an HTTP action to call your Databricks endpoint:

Method: POST
URI:

https://adb-1311343844234579.11.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations

Headers:

Authorization: Bearer dapi****************
Content-Type: application/json

Body:

{
  "inputs": [
    {
      "Date": "2024-07-03",
      "OPEN": 2300,
      "HIGH": 2400,
      "LOW": 2298,
      "CLOSE": 2350
    }
  ]
}

Logic Apps will securely send this payload over private networking to Databricks and wait for the response.


Step 4 — Deploy Power Automate Flow

Now, let’s connect Power Automate to Logic Apps.

Your Power Automate flow will:

  • Trigger from Power BI (e.g. a button click).
  • Call the Logic Apps HTTP endpoint.
  • Receive the ML prediction results.
  • Optionally, update Power BI visuals or datasets.

Power Automate HTTP Request

Configure your HTTP action:

Method: POST
URI: The URL from your Logic App’s HTTP trigger (Step 1).
Headers:

Content-Type: application/json

Body:

{
  "inputs": [
    {
      "Date": "2024-07-03",
      "OPEN": 2300,
      "HIGH": 2400,
      "LOW": 2298,
      "CLOSE": 2350
    }
  ]
}

Why Not Call Databricks Directly From Power Automate?

A natural question is: why can’t we skip Logic Apps and call Databricks directly from Power Automate?

Here’s why:

  • Power Automate sends HTTP requests from public endpoints.
  • Databricks rejects all public traffic if public access is disabled.
  • There’s no way for Power Automate to reach Databricks privately.

Logic Apps in a VNet acts as a secure intermediary:

  • Power Automate → Logic Apps → Databricks
  • Databricks → Logic Apps → Power Automate

This architecture bridges private and public networks securely.


Benefits of This Architecture

Implementing this solution provides:

Enterprise Security

  • Complies with strict network isolation policies.
  • Prevents exposing Databricks to the internet.

Seamless User Experience

  • Power BI users get real-time predictions without knowing about the backend complexity.

Scalable Architecture

  • Logic Apps can handle thousands of requests.
  • Easy to maintain and extend for other models or services.

Governance and Monitoring

  • Centralized logging in Logic Apps.
  • Easy to integrate with Azure Monitor for alerting.

Use Case: Predicting Stock Prices

Imagine you have a machine learning model predicting HDFC high prices.

  • Power BI user clicks a “Predict” button.
  • Power Automate triggers a flow.
  • Flow sends stock price inputs to Logic Apps.
  • Logic Apps calls Databricks Model Serving.
  • Databricks returns the predicted high price.
  • Power BI visual updates dynamically with the prediction!

All of this happens securely, without exposing Databricks to the public internet. 🔒


Conclusion

Integrating Power BI with Databricks Model Serving under strict network security constraints can seem daunting.

But with the help of Logic Apps deployed inside a VNet, you can:

  • Securely bridge public and private networks
  • Enable real-time ML predictions in Power BI
  • Maintain enterprise-level security and compliance

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.


Azure Databricks Series: Step-by-Step Guide to Integrating Power BI with Databricks Model Serving

Watch this as a video on our you tube channel JBSWiki.

Are you ready to unlock the power of real-time machine learning predictions directly in your Power BI dashboards? 🤩

With Databricks Model Serving, we can host machine learning models as REST APIs. But how do we bring those predictions into Power BI?

In this blog, I’ll show you two practical methods to connect Power BI to Databricks Model Serving and fetch predictions step by step.

This is a follow-up to my earlier work where I explained how to create a Databricks Serving Model. If you missed that, check it out first so you have your serving endpoint ready to go!

💡 Why Integrate Power BI with Databricks Model Serving?

Businesses are increasingly driven by real-time insights. Instead of waiting for static reports, you can now embed ML predictions right inside your dashboards, empowering data-driven decisions at the speed of business.

Some benefits:

✅ Real-time predictions in dashboards
✅ No manual data exports
✅ Fully automated pipelines
✅ Empower business users with AI insights


🔗 The Architecture

Here’s how the integration works:

Power BI ➡ Python Script / Power Query ➡ Databricks Serving Endpoint ➡ Prediction Results ➡ Power BI visuals

All communication happens over secure REST APIs, usually authenticated with a Databricks Personal Access Token (PAT).


⚙️ Pre-requisites

Before diving in, ensure you have:

  • An Azure Databricks workspace
  • A deployed ML model as a Databricks Serving Endpoint
  • A Databricks Personal Access Token
  • Power BI Desktop installed
  • Basic knowledge of Power Query and Python

🛠 Method 1: Get Data → Python Script in Power BI

This is one of the easiest ways to connect Power BI to Databricks Serving endpoints if you’re comfortable writing Python.


🔹 How It Works

You’ll:

  1. Go to Home > Get Data > Python Script in Power BI.
  2. Paste the Python code that:
    • Makes an HTTP POST request to the Databricks model endpoint.
    • Converts predictions into a DataFrame for visualization.

✅ Example Python Script

Here’s a full working Python script for Power BI:

import requests
import json
import pandas as pd

# Databricks endpoint URL
endpoint_url = "https://adb-2345432345567.11.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations"

# Your Databricks PAT token
token = "Databricks_Pat_Token"

# Prepare the payload
payload = {
    "inputs": [
        {
            "Date": "2024-07-03",
            "OPEN": 2000,
            "HIGH": 2079,
            "LOW": 1987,
            "CLOSE": 2075
        }
    ]
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Make the POST request
response = requests.post(endpoint_url, headers=headers, json=payload)

# Convert input payload to DataFrame
input_df = pd.DataFrame(payload["inputs"])

if response.ok:
    result_json = response.json()
    predictions = result_json.get("predictions", [])
    
    # Put into dataframe
    output_df = pd.DataFrame(predictions, columns=["Prediction"])
else:
    # Handle errors gracefully
    output_df = pd.DataFrame({"Error": [response.text]})

input_df
output_df

🔎 What This Does

  • Sends input data to Databricks Serving endpoint.
  • Receives predictions in JSON format.
  • Converts predictions to a Pandas DataFrame.
  • Returns it to Power BI for visualizations.

Power BI will import both input_df and output_df as separate tables. You can merge them if needed.


⚠️ Important Notes

  • Don’t hard-code secrets like tokens in production. Use Key Vaults or environment variables.
  • Keep an eye on API call costs and throttling.

🛠 Method 2: Enter Data → Power Query → Python Script

This method is powerful when you want business users to enter data manually in Power BI and instantly fetch predictions.


🔹 How It Works

  1. Use Enter Data in Power BI to create a table of inputs.
  2. Pass this table into a Python script via Power Query.
  3. Call the Databricks endpoint and merge predictions into your original data.

This allows users to dynamically modify input data in Power BI itself.


✅ Example Power Query Script

Below is the Power Query M code you’d place in Advanced Editor in Power BI:

let
    Source = HDFC_Input,
    RunPython = Python.Execute("
import pandas as pd
import requests
import json

# Power BI table comes in as 'dataset'
input_df = dataset

# Force Date column to string
if 'Date' in input_df.columns:
    input_df['Date'] = input_df['Date'].astype(str)

# Convert numeric columns to floats
for col in input_df.columns:
    if pd.api.types.is_numeric_dtype(input_df[col]):
        input_df[col] = input_df[col].apply(lambda x: float(x) if pd.notnull(x) else None)

inputs = input_df.to_dict(orient='records')

payload = {
    'inputs': inputs
}

endpoint_url = 'https://adb-2345432345567.11.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations'
token = 'Databricks_Pat_Token'
headers = {
    'Authorization': f'Bearer {token}',
    'Content-Type': 'application/json'
}

response = requests.post(endpoint_url, headers=headers, json=payload)

if response.ok:
    result_json = response.json()
    
    try:
        predictions = result_json['predictions']
        
        # Handle possible formats
        if isinstance(predictions, list) and isinstance(predictions[0], (int, float)):
            output_df = pd.DataFrame({'Prediction': predictions})
        elif isinstance(predictions, list) and isinstance(predictions[0], list):
            output_df = pd.DataFrame(predictions, columns=['Prediction'])
        elif isinstance(predictions, list) and isinstance(predictions[0], dict):
            output_df = pd.DataFrame(predictions)
        else:
            output_df = pd.DataFrame({'Error': ['Unsupported prediction format']})
    except Exception as e:
        output_df = pd.DataFrame({'Error': [str(e)]})
else:
    output_df = pd.DataFrame({'Error': [response.text]})

# Merge prediction into input
final_df = input_df.copy()
try:
    final_df['Prediction'] = output_df['Prediction']
except:
    final_df['Error'] = output_df.iloc[:, 0]

final_df
", [dataset=Source])
in
    RunPython

🔎 What This Does

  • Takes user-entered data from Power BI as a table.
  • Converts it to JSON payload.
  • Calls Databricks Model Serving endpoint.
  • Handles various possible prediction formats:
    • single numeric predictions
    • lists of predictions
    • dictionaries of results
  • Merges predictions back into the original table.

💡 Advantages of This Method

✅ Super flexible for dynamic inputs
✅ Great for PoC demos and interactive reports
✅ Business-friendly approach—no code needed by users
✅ Predictions update automatically when inputs change


⚠️ Limitations and Considerations

  • Python scripting in Power BI requires the Python runtime installed locally.
  • Personal Access Tokens should be secured (e.g. not stored in plain text).
  • There might be latency if your model takes time to compute predictions.

🎯 Use Cases

Integrating Databricks Model Serving into Power BI opens up endless possibilities:

Stock Price Prediction
Sales Forecasting
Customer Churn Analysis
Fraud Detection
Predictive Maintenance


🚀 Conclusion

Integrating Databricks Model Serving with Power BI is a game-changer for real-time analytics. Whether you use the Python script approach or Power Query with Enter Data, you’re enabling truly interactive, predictive dashboards that empower business users.

✅ Next Steps

  • Make sure your Databricks Serving endpoint is production-ready.
  • Move sensitive tokens to secure stores like Azure Key Vault.
  • Optimize API call performance for large-scale use.
  • Explore scheduled refreshes in Power BI Service to automate insights.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.