Azure Databricks Series: The Hidden Way to Optimize Costs – No One Talks About!

Managing costs in Azure Databricks can be a real challenge. Clusters often stay idle, autoscaling isn’t always tuned properly, and over-provisioned resources can quickly blow up your bill 💸. In this blog, I’ll walk you through how you can analyze, monitor, and optimize costs in your own Databricks environment using Power BI and AI-powered recommendations.


Why Focus on Cost Optimization?

Azure Databricks is powerful, but without the right monitoring, it’s easy to:

  • Leave clusters running when not in use 🔄
  • Oversize driver and worker nodes 🖥️
  • Misconfigure autoscaling policies 📈
  • Miss out on spot instances or cluster pools

That’s why cost optimization is a must-have practice for anyone running Databricks in production or development.


What You’ll Learn in This Tutorial

Here’s the simple 3-step process we’ll follow:

1️⃣ Collect Cluster Configuration Data

In my previous videos, I showed how to use Azure Function Apps to export cluster configuration details.

These configurations will form the raw dataset for analysis.

2️⃣ Analyze with Power BI 📊

We’ll load the exported data into Power BI and use a ready-made Power BI template (download link below) to visualize:

  • Cluster usage
  • Node sizes
  • Autoscaling patterns
  • Idle vs active time

This gives you a clear picture of where money is being spent.

3️⃣ AI-Powered Recommendations 🤖

Finally, we’ll feed the Power BI output into an AI agent. The AI will provide actionable recommendations such as:

  • Resize underutilized clusters
  • Enable auto-termination for idle clusters
  • Use job clusters instead of all-purpose clusters
  • Consider spot instances to lower costs

Download the Power BI Template

To make this even easier, I’ve created a Power BI template file (.pbit) that you can use right away. Just download it, connect it with your exported cluster configuration data, and start analyzing your environment.

Pro Tips for Cost Savings

💡 Enable auto-termination for idle clusters
💡 Use job clusters instead of always-on interactive clusters
💡 Configure autoscaling properly
💡 Try spot instances where workloads allow
💡 Regularly monitor usage with Power BI dashboards


Final Thoughts

With the combination of Power BI and AI, cost optimization in Azure Databricks becomes less of a guessing game and more of a data-driven process.

📺 If you prefer a video walkthrough, check out my detailed step-by-step YouTube tutorial here: Azure Databricks Series on YouTube

👉 Don’t forget to like, share, and subscribe to stay updated with more tutorials in this series!

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Azure Databricks Series: Deploying Custom Model with Model Serving & Python Integration Step by Step

Watch this as a video on our you tube channel JBSWiki.

In the ever-evolving world of data and AI, one of the biggest challenges is bridging the gap between building a machine learning model and putting it into production so it can generate real business value.

Imagine this: you’ve spent weeks training a model to predict stock prices with great accuracy. It’s sitting in your Databricks workspace, looking perfect. But how do you actually use it in real-world applications to serve predictions in real time?

This is where Databricks Model Serving steps in to save the day.

In this blog, I’ll show you how to:

✅ Deploy a custom machine learning model as a serving endpoint in Databricks
✅ Understand why model serving is crucial in production environments
✅ Call your deployed model using Python code for real-time predictions

Let’s dive in!


🎯 Why Model Serving Matters

Training a machine learning model is only half the battle. In production environments, you often need:

  • Real-time predictions for dynamic applications like stock price forecasting, fraud detection, or recommendation systems.
  • A scalable, secure way to expose your model to other systems or applications.
  • Low-latency responses without needing to run entire notebooks or pipelines every time you want a prediction.

Databricks Model Serving solves these challenges by turning your trained model into a REST API. This means you can easily integrate machine learning into your applications, dashboards, and workflows without reinventing the wheel.


🧩 How Model Serving Fits into the Modern ML Workflow

Here’s how Databricks Model Serving fits into the bigger picture:

  1. Data Collection & Storage — Gather raw data into Azure Data Lake Storage or other data lakes.
  2. Data Engineering & Transformation — Clean and prepare the data using Databricks notebooks and Delta Lake.
  3. Model Training & Experimentation — Train models with MLflow and notebooks.
  4. Model Registration — Save your best model versions into the MLflow Model Registry.
  5. Model Serving — Deploy the model as an endpoint using Databricks Model Serving.
  6. Prediction Consumption — Call the endpoint from Python, applications, dashboards, or other services.

In this blog, we’ll focus on steps 5 and 6: Model Serving and how to consume predictions.


🚀 Deploying Your Custom Model in Databricks

Before we can call our model from Python, we need to deploy it as a serving endpoint.

If you haven’t done this yet, here’s a quick high-level overview of the steps:

  • Register Your Model in the MLflow Model Registry.
  • Navigate to Model Serving in the Databricks UI.
  • Select the model version you want to deploy.
  • Choose the compute size for serving (small, medium, large, etc.).
  • Click Deploy.

Databricks will handle all the heavy lifting, spinning up the infrastructure required to serve your model as a REST API endpoint.

For this example, let’s assume you’ve already deployed a model named HDFC_High_price_prediction.


🔗 Example Use Case: Stock Price Prediction

Let’s say we’ve built a model to predict high prices for HDFC Bank stock based on daily trading data.

We now want to:

  • Send trading data (like open, high, low, close prices) to our deployed endpoint.
  • Receive a prediction for the stock’s future high price.

This enables us to make real-time predictions and integrate them into trading dashboards, alerting systems, or further analytics.


🐍 Calling the Databricks Model Serving Endpoint Using Python

Now comes the fun part: calling your deployed model endpoint using Python!

Below is a Python script you can run from:

  • A Databricks notebook
  • A local Python environment
  • An application server

Here’s how to do it:

import requests
import json

# Databricks endpoint URL
endpoint_url = "https://adb-131152523232571.21.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations"

# Your Databricks PAT token
token = " Databricks PAT token"

# Prepare the payload
payload = {
    "inputs": [
        {
            "Date": "2024-07-03",  
            "OPEN": 2000,
            "HIGH": 2079,
            "LOW": 1987,
            "CLOSE": 2075
        }
    ]
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.post(endpoint_url, headers=headers, json=payload)

print("Status Code:", response.status_code)
print("Response:", response.json())

💡 How This Code Works

Let’s break it down:

  • endpoint_url → This is your Databricks Model Serving URL. You’ll find this in the Databricks UI under your deployed endpoint details.
  • token → This is your Databricks Personal Access Token (PAT). It’s crucial for authenticating API calls securely. Never share your PAT publicly.
  • payload → This JSON object represents your input data. It matches the format your model expects, e.g., columns for Date, OPEN, HIGH, LOW, and CLOSE prices.
  • headers → Standard HTTP headers, including the Authorization Bearer token and Content-Type.
  • requests.post() → This sends your data to the model’s endpoint and returns a prediction.
  • response.json() → Prints the model’s prediction result!

If everything is configured correctly, you’ll receive a JSON response containing your predicted value.


✅ Common Use Cases for Databricks Model Serving

Here are just a few real-world scenarios where Databricks Model Serving shines:

  • Financial institutions predicting stock prices or risk scores in real time.
  • Retail companies delivering personalized product recommendations to customers.
  • Healthcare providers forecasting patient outcomes or prioritizing triage.
  • Manufacturing industries performing predictive maintenance on equipment.
  • Energy companies optimizing grids or predicting power demands.

Databricks Model Serving makes it easy to turn machine learning into real-time business value.


🔒 Best Practices for Secure Model Serving

When deploying and consuming model endpoints:

✅ Always protect your tokens. Store them securely and never hard-code them in publicly visible code.

✅ Use versioning in Databricks MLflow Model Registry to manage updates and rollbacks safely.

✅ Monitor endpoint performance using Databricks’ built-in dashboards for latency, error rates, and cost management.

✅ Keep your input payloads clean and aligned with what your model expects to avoid errors.


🌟 Wrapping Up

Databricks Model Serving is a game changer for getting machine learning models into production quickly and reliably. Instead of wrestling with complex infrastructure, you can deploy your models with just a few clicks and call them from anywhere using Python.

In this blog, we’ve explored:

✅ Why model serving is crucial in modern ML workflows
✅ How Databricks simplifies deployment as an API
✅ How to invoke your model endpoint using Python

Whether you’re building models for financial forecasting, customer personalization, or predictive maintenance, Databricks Model Serving lets you bring your machine learning innovations to life in production.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Azure Databricks Series: Hands-On Machine Learning for Stock Prediction

Introduction

In today’s data-driven world, machine learning (ML) plays a crucial role in predictive analytics. One of the most popular use cases is stock price prediction, where ML algorithms analyze historical data to forecast future trends. In this blog, we will explore how to leverage Azure Databricks for stock price prediction using machine learning.

📺 Watch the full tutorial video here

📂 Download the code file from here


Why Use Azure Databricks for Machine Learning?

Azure Databricks provides a powerful environment for big data processing, machine learning, and real-time analytics. Here are some key reasons why it is ideal for ML-based stock prediction:

  • Scalability – Handles large volumes of historical stock data efficiently.
  • Integration – Seamlessly connects with Azure Storage, Delta Lake, and MLflow.
  • Collaborative Environment – Supports teamwork with shared notebooks and version control.
  • High Performance – Optimized for distributed computing and deep learning workloads.

Workflow for Stock Price Prediction

The machine learning workflow in Azure Databricks for stock prediction involves multiple steps:

Step 1: Data Collection

Stock market data is gathered from a financial data provider. Typically, historical stock prices include:

  • Date
  • Open price
  • High price
  • Low price
  • Close price
  • Volume traded

Step 2: Data Preprocessing

To ensure accurate predictions, the raw data undergoes preprocessing:

  • Handling missing values
  • Normalizing stock prices
  • Converting date-time format
  • Feature engineering to extract trends and patterns

Step 3: Storing Data in a Delta Table

Azure Databricks supports Delta Lake, an optimized storage layer for big data analytics. The cleaned dataset is stored in a Delta Table, which ensures:

  • ACID transactions for data integrity
  • Scalability to handle large datasets
  • Versioning for better data management

Step 4: Model Selection & Training

For stock prediction, various machine learning models can be used, such as:

  • Linear Regression – Suitable for trend analysis.
  • Random Forest – Effective for capturing non-linear relationships.
  • LSTM (Long Short-Term Memory) – A deep learning model ideal for time-series forecasting.

The model is trained using historical data, and performance is evaluated using metrics like Mean Squared Error (MSE) and R-Squared.

Step 5: Predicting Future Stock Prices

Once the model is trained, it is used to predict next-day stock prices based on recent trends.

Step 6: Visualization & Insights

The predicted prices are visualized using interactive charts and graphs to compare with actual values. This helps in understanding the performance of the model and refining future predictions.


Challenges in Stock Price Prediction

While ML provides valuable insights, predicting stock prices has inherent challenges:

  • Market Volatility – Prices can fluctuate due to unforeseen events.
  • External Factors – News, political events, and investor sentiment impact prices.
  • Overfitting – Models may perform well on historical data but struggle with real-world scenarios.

Despite these challenges, machine learning helps traders and investors make informed decisions based on data-driven insights.


Conclusion

Azure Databricks provides a robust, scalable, and efficient platform for stock price prediction using machine learning. By leveraging its powerful data processing capabilities and ML frameworks, we can build accurate and insightful models to analyze stock trends.

📺 Watch the complete tutorial here

📂 Download the code file from here

Stay tuned for more Azure Databricks tutorials in this series! 🚀