Azure Databricks Series: Deploying Custom Model with Model Serving & Python Integration Step by Step

Watch this as a video on our you tube channel JBSWiki.

In the ever-evolving world of data and AI, one of the biggest challenges is bridging the gap between building a machine learning model and putting it into production so it can generate real business value.

Imagine this: you’ve spent weeks training a model to predict stock prices with great accuracy. It’s sitting in your Databricks workspace, looking perfect. But how do you actually use it in real-world applications to serve predictions in real time?

This is where Databricks Model Serving steps in to save the day.

In this blog, I’ll show you how to:

✅ Deploy a custom machine learning model as a serving endpoint in Databricks
✅ Understand why model serving is crucial in production environments
✅ Call your deployed model using Python code for real-time predictions

Let’s dive in!


🎯 Why Model Serving Matters

Training a machine learning model is only half the battle. In production environments, you often need:

  • Real-time predictions for dynamic applications like stock price forecasting, fraud detection, or recommendation systems.
  • A scalable, secure way to expose your model to other systems or applications.
  • Low-latency responses without needing to run entire notebooks or pipelines every time you want a prediction.

Databricks Model Serving solves these challenges by turning your trained model into a REST API. This means you can easily integrate machine learning into your applications, dashboards, and workflows without reinventing the wheel.


🧩 How Model Serving Fits into the Modern ML Workflow

Here’s how Databricks Model Serving fits into the bigger picture:

  1. Data Collection & Storage — Gather raw data into Azure Data Lake Storage or other data lakes.
  2. Data Engineering & Transformation — Clean and prepare the data using Databricks notebooks and Delta Lake.
  3. Model Training & Experimentation — Train models with MLflow and notebooks.
  4. Model Registration — Save your best model versions into the MLflow Model Registry.
  5. Model Serving — Deploy the model as an endpoint using Databricks Model Serving.
  6. Prediction Consumption — Call the endpoint from Python, applications, dashboards, or other services.

In this blog, we’ll focus on steps 5 and 6: Model Serving and how to consume predictions.


🚀 Deploying Your Custom Model in Databricks

Before we can call our model from Python, we need to deploy it as a serving endpoint.

If you haven’t done this yet, here’s a quick high-level overview of the steps:

  • Register Your Model in the MLflow Model Registry.
  • Navigate to Model Serving in the Databricks UI.
  • Select the model version you want to deploy.
  • Choose the compute size for serving (small, medium, large, etc.).
  • Click Deploy.

Databricks will handle all the heavy lifting, spinning up the infrastructure required to serve your model as a REST API endpoint.

For this example, let’s assume you’ve already deployed a model named HDFC_High_price_prediction.


🔗 Example Use Case: Stock Price Prediction

Let’s say we’ve built a model to predict high prices for HDFC Bank stock based on daily trading data.

We now want to:

  • Send trading data (like open, high, low, close prices) to our deployed endpoint.
  • Receive a prediction for the stock’s future high price.

This enables us to make real-time predictions and integrate them into trading dashboards, alerting systems, or further analytics.


🐍 Calling the Databricks Model Serving Endpoint Using Python

Now comes the fun part: calling your deployed model endpoint using Python!

Below is a Python script you can run from:

  • A Databricks notebook
  • A local Python environment
  • An application server

Here’s how to do it:

import requests
import json

# Databricks endpoint URL
endpoint_url = "https://adb-131152523232571.21.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations"

# Your Databricks PAT token
token = " Databricks PAT token"

# Prepare the payload
payload = {
    "inputs": [
        {
            "Date": "2024-07-03",  
            "OPEN": 2000,
            "HIGH": 2079,
            "LOW": 1987,
            "CLOSE": 2075
        }
    ]
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.post(endpoint_url, headers=headers, json=payload)

print("Status Code:", response.status_code)
print("Response:", response.json())

💡 How This Code Works

Let’s break it down:

  • endpoint_url → This is your Databricks Model Serving URL. You’ll find this in the Databricks UI under your deployed endpoint details.
  • token → This is your Databricks Personal Access Token (PAT). It’s crucial for authenticating API calls securely. Never share your PAT publicly.
  • payload → This JSON object represents your input data. It matches the format your model expects, e.g., columns for Date, OPEN, HIGH, LOW, and CLOSE prices.
  • headers → Standard HTTP headers, including the Authorization Bearer token and Content-Type.
  • requests.post() → This sends your data to the model’s endpoint and returns a prediction.
  • response.json() → Prints the model’s prediction result!

If everything is configured correctly, you’ll receive a JSON response containing your predicted value.


✅ Common Use Cases for Databricks Model Serving

Here are just a few real-world scenarios where Databricks Model Serving shines:

  • Financial institutions predicting stock prices or risk scores in real time.
  • Retail companies delivering personalized product recommendations to customers.
  • Healthcare providers forecasting patient outcomes or prioritizing triage.
  • Manufacturing industries performing predictive maintenance on equipment.
  • Energy companies optimizing grids or predicting power demands.

Databricks Model Serving makes it easy to turn machine learning into real-time business value.


🔒 Best Practices for Secure Model Serving

When deploying and consuming model endpoints:

✅ Always protect your tokens. Store them securely and never hard-code them in publicly visible code.

✅ Use versioning in Databricks MLflow Model Registry to manage updates and rollbacks safely.

✅ Monitor endpoint performance using Databricks’ built-in dashboards for latency, error rates, and cost management.

✅ Keep your input payloads clean and aligned with what your model expects to avoid errors.


🌟 Wrapping Up

Databricks Model Serving is a game changer for getting machine learning models into production quickly and reliably. Instead of wrestling with complex infrastructure, you can deploy your models with just a few clicks and call them from anywhere using Python.

In this blog, we’ve explored:

✅ Why model serving is crucial in modern ML workflows
✅ How Databricks simplifies deployment as an API
✅ How to invoke your model endpoint using Python

Whether you’re building models for financial forecasting, customer personalization, or predictive maintenance, Databricks Model Serving lets you bring your machine learning innovations to life in production.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

🔷 Azure Databricks Series: Displaying All Clusters in a Databricks Workspace – SQL Warehouses, All-Purpose Compute, Job Clusters & More

In a modern data landscape, keeping track of all compute resources in your Azure Databricks Workspace—including SQL Warehouses, Job Clusters, and All-Purpose Clusters—is crucial for monitoring costs, performance, and resource utilization. 🚀

In this post, we’ll walk through a PySpark + REST API solution to dynamically list all clusters in your workspace, categorize them by type, and store the results into a Delta table for easy access and reporting.


📌 Why Is This Important?

Databricks provides different types of compute environments:

  • SQL Warehouses (formerly SQL Endpoints) for BI workloads
  • ⚙️ All-Purpose Clusters for interactive analysis
  • 📦 Job Clusters for scheduled or triggered pipelines

Tracking these clusters helps:

  • Audit usage & track ownership 👀
  • Understand memory footprint & scale ⛽
  • Identify unused or idle clusters for optimization 💸

🛠️ Prerequisites

Make sure you have the following ready:

  • An active Azure Databricks Workspace
  • A Personal Access Token with workspace read permissions
  • A Spark session running in a notebook

📄 Code

import requests
from pyspark.sql import SparkSession

# provide Databricks config
instance = "https://adb-13115258385123.34.azuredatabricks.net/"
token = "dapiaXXXXXXXX"

headers = {
    "Authorization": f"Bearer {token}"
}

def size_to_memory(cluster_size):
    mapping = {
        "2X-Small": "64 GB",
        "X-Small": "128 GB",
        "Small": "256 GB",
        "Medium": "512 GB",
        "Large": "1 TB",
        "X-Large": "2 TB",
        "2X-Large": "4 TB",
        "3X-Large": "8 TB",
        "4X-Large": "16 TB"
    }
    return mapping.get(cluster_size, "Unknown")

node_url = f"{instance}/api/2.0/clusters/list-node-types"
node_response = requests.get(node_url, headers=headers)
node_types = node_response.json().get("node_types", [])

node_memory_map = {}
for node in node_types:
    node_id = node.get("node_type_id")
    mem_gb = node.get("memory_mb", 0) // 1024
    node_memory_map[node_id] = f"{mem_gb} GB"

sql_url = f"{instance}/api/2.0/sql/endpoints"
sql_response = requests.get(sql_url, headers=headers)
sql_data = sql_response.json()

records = []
for endpoint in sql_data.get("endpoints", []):
    records.append({
        "name": endpoint.get("name", ""),
        "id": endpoint.get("id", ""),
        "cluster_size_or_node_type": endpoint.get("cluster_size", ""),
        "approx_memory": size_to_memory(endpoint.get("cluster_size", "")),
        "auto_stop_mins": str(endpoint.get("auto_stop_mins", "")),
        "creator": endpoint.get("creator_name", ""),
        "state": endpoint.get("state", ""),
        "cluster_type": "SQL Warehouse"
    })

cluster_url = f"{instance}/api/2.0/clusters/list"
cluster_response = requests.get(cluster_url, headers=headers)
cluster_data = cluster_response.json()

for cluster in cluster_data.get("clusters", []):
    node_type_id = cluster.get("node_type_id", "")
    mem = node_memory_map.get(node_type_id, "Unknown")
    
    autoscale = cluster.get("autoscale", {})
    if autoscale:
        workers = f'{autoscale.get("min_workers")} - {autoscale.get("max_workers")}'
    else:
        workers = str(cluster.get("num_workers", 0))

    source = cluster.get("cluster_source", "").upper()
    if source == "JOB":
        cluster_type = "Job"
    elif source in ["UI", "API"]:
        cluster_type = "All-Purpose"
    else:
        cluster_type = "Unknown"

    records.append({
        "name": cluster.get("cluster_name", ""),
        "id": cluster.get("cluster_id", ""),
        "cluster_size_or_node_type": node_type_id,
        "approx_memory": mem,
        "auto_stop_mins": str(cluster.get("autotermination_minutes", "")),
        "creator": cluster.get("creator_user_name", ""),
        "state": cluster.get("state", ""),
        "cluster_type": cluster_type
    })

df = spark.createDataFrame(records)

df.write.format("delta").mode("overwrite").saveAsTable("default.all_clusters_summary")

display(spark.table("default.all_clusters_summary"))

🔒 Security Note

👉 Always keep your token safe. Never expose it in version control or public notebooks. Consider storing it securely in Databricks secrets for production use.


🧠 Final Thoughts

With this solution, you can:

  • Get real-time inventory of all Databricks compute environments
  • Ensure accountability and governance
  • Optimize resource usage and cost

🔁 Automate this with a scheduled job or dashboard, and you’ve got yourself a powerful monitoring solution!


💬 Got Questions?

Let me know in the comments for any questions or enhancements!

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Azure Databricks Series: Displaying All Serverless SQL Warehouses in Your Workspace

When working with Azure Databricks, it’s often necessary to programmatically retrieve and manage metadata about your compute resources. One such resource is the Serverless SQL Warehouse, designed for cost-effective and scalable interactive analytics.

In this blog, we’ll walk through a step-by-step Python script that helps you list all Serverless SQL Warehouses in your Databricks workspace using the REST API and persist the results in a Delta table for further analysis.

🔍 Why Monitor Serverless SQL Warehouses?

Serverless SQL Warehouses are a key part of many organizations’ data strategies due to:

  • Auto-scaling capabilities
  • No infrastructure management
  • Pay-per-use pricing model

By tracking your serverless SQL endpoints, you can gain insights into:

  • Who created them
  • Their sizes and memory footprints
  • Auto-stop configurations
  • Their current state (running/stopped)

🛠️ Solution Overview

We’ll use the Databricks SQL Endpoints API to get the list of all SQL Warehouses, filter out only the serverless ones, enrich the data with approximate memory, and save it into a Delta table using PySpark.


🧪 Code Walkthrough

import requests
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

instance = "https://adb-1311537494242340.26.azuredatabricks.net/"
token = "dapia***************************"

headers = {
    "Authorization": f"Bearer {token}"
}

url = f"{instance}/api/2.0/sql/endpoints"
response = requests.get(url, headers=headers)
data = response.json()

def size_to_memory(cluster_size):
    mapping = {
        "2X-Small": "64 GB",
        "X-Small": "128 GB",
        "Small": "256 GB",
        "Medium": "512 GB",
        "Large": "1 TB",
        "X-Large": "2 TB",
        "2X-Large": "4 TB",
        "3X-Large": "8 TB",
        "4X-Large": "16 TB"
    }
    return mapping.get(cluster_size, "Unknown")

# Prepare the data
records = []
for endpoint in data.get("endpoints", []):
    if endpoint.get("enable_serverless_compute", False):
        records.append({
            "name": endpoint["name"],
            "id": endpoint["id"],
            "cluster_size": endpoint["cluster_size"],
            "approx_memory": size_to_memory(endpoint["cluster_size"]),
            "auto_stop_mins": endpoint["auto_stop_mins"],
            "creator": endpoint["creator_name"],
            "state": endpoint["state"]
        })
# Create Spark DataFrame
df = spark.createDataFrame(records)

# Save to Delta (overwrite or append as needed)
df.write.format("delta").mode("overwrite").saveAsTable("default.serverless_sql_warehouses")

display(spark.table("default.serverless_sql_warehouses"))

✅ Sample Output

💡 Pro Tips

  • 🔐 Never hardcode tokens in production scripts. Use Azure Key Vault or Databricks secrets to securely manage secrets.
  • 🛑 Consider implementing pagination if your workspace has many warehouses.
  • 📊 Use this Delta table as a source for monitoring dashboards in Power BI or Databricks SQL.

📚 Conclusion

With just a few lines of code, you can automate the discovery of all serverless SQL warehouses, store their metadata in a Delta Lake, and use it for reporting, auditing, or monitoring purposes. This is particularly useful in large-scale environments where managing SQL compute efficiently is crucial.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.