Posted by

Posted on

August 16, 2025

Posted under

Comments

Azure Databricks Series: The Hidden Way to Optimize Costs – No One Talks About!

Managing costs in Azure Databricks can be a real challenge. Clusters often stay idle, autoscaling isn’t always tuned properly, and over-provisioned resources can quickly blow up your bill 💸. In this blog, I’ll walk you through how you can analyze, monitor, and optimize costs in your own Databricks environment using Power BI and AI-powered recommendations.

Why Focus on Cost Optimization?

Azure Databricks is powerful, but without the right monitoring, it’s easy to:

Leave clusters running when not in use 🔄
Oversize driver and worker nodes 🖥️
Misconfigure autoscaling policies 📈
Miss out on spot instances or cluster pools

That’s why cost optimization is a must-have practice for anyone running Databricks in production or development.

What You’ll Learn in This Tutorial

Here’s the simple 3-step process we’ll follow:

1️⃣ Collect Cluster Configuration Data

In my previous videos, I showed how to use Azure Function Apps to export cluster configuration details.

These configurations will form the raw dataset for analysis.

2️⃣ Analyze with Power BI 📊

We’ll load the exported data into Power BI and use a ready-made Power BI template (download link below) to visualize:

Cluster usage
Node sizes
Autoscaling patterns
Idle vs active time

This gives you a clear picture of where money is being spent.

3️⃣ AI-Powered Recommendations 🤖

Finally, we’ll feed the Power BI output into an AI agent. The AI will provide actionable recommendations such as:

Resize underutilized clusters
Enable auto-termination for idle clusters
Use job clusters instead of all-purpose clusters
Consider spot instances to lower costs

Download the Power BI Template

To make this even easier, I’ve created a Power BI template file (.pbit) that you can use right away. Just download it, connect it with your exported cluster configuration data, and start analyzing your environment.

JB_Databricks_Cost_Optimization_Report Download

Pro Tips for Cost Savings

💡 Enable auto-termination for idle clusters
💡 Use job clusters instead of always-on interactive clusters
💡 Configure autoscaling properly
💡 Try spot instances where workloads allow
💡 Regularly monitor usage with Power BI dashboards

Final Thoughts

With the combination of Power BI and AI, cost optimization in Azure Databricks becomes less of a guessing game and more of a data-driven process.

📺 If you prefer a video walkthrough, check out my detailed step-by-step YouTube tutorial here: Azure Databricks Series on YouTube

👉 Don’t forget to like, share, and subscribe to stay updated with more tutorials in this series!

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted by

Vivek Janakiraman

Posted on

July 6, 2025

Posted under

Azure Databricks

Comments

Azure Databricks Series: Deploying Custom Model with Model Serving & Python Integration Step by Step

Watch this as a video on our you tube channel JBSWiki.

In the ever-evolving world of data and AI, one of the biggest challenges is bridging the gap between building a machine learning model and putting it into production so it can generate real business value.

Imagine this: you’ve spent weeks training a model to predict stock prices with great accuracy. It’s sitting in your Databricks workspace, looking perfect. But how do you actually use it in real-world applications to serve predictions in real time?

This is where Databricks Model Serving steps in to save the day.

In this blog, I’ll show you how to:

✅ Deploy a custom machine learning model as a serving endpoint in Databricks
✅ Understand why model serving is crucial in production environments
✅ Call your deployed model using Python code for real-time predictions

Let’s dive in!

🎯 Why Model Serving Matters

Training a machine learning model is only half the battle. In production environments, you often need:

Real-time predictions for dynamic applications like stock price forecasting, fraud detection, or recommendation systems.
A scalable, secure way to expose your model to other systems or applications.
Low-latency responses without needing to run entire notebooks or pipelines every time you want a prediction.

Databricks Model Serving solves these challenges by turning your trained model into a REST API. This means you can easily integrate machine learning into your applications, dashboards, and workflows without reinventing the wheel.

🧩 How Model Serving Fits into the Modern ML Workflow

Here’s how Databricks Model Serving fits into the bigger picture:

Data Collection & Storage — Gather raw data into Azure Data Lake Storage or other data lakes.
Data Engineering & Transformation — Clean and prepare the data using Databricks notebooks and Delta Lake.
Model Training & Experimentation — Train models with MLflow and notebooks.
Model Registration — Save your best model versions into the MLflow Model Registry.
Model Serving — Deploy the model as an endpoint using Databricks Model Serving.
Prediction Consumption — Call the endpoint from Python, applications, dashboards, or other services.

In this blog, we’ll focus on steps 5 and 6: Model Serving and how to consume predictions.

🚀 Deploying Your Custom Model in Databricks

Before we can call our model from Python, we need to deploy it as a serving endpoint.

If you haven’t done this yet, here’s a quick high-level overview of the steps:

Register Your Model in the MLflow Model Registry.
Navigate to Model Serving in the Databricks UI.
Select the model version you want to deploy.
Choose the compute size for serving (small, medium, large, etc.).
Click Deploy.

Databricks will handle all the heavy lifting, spinning up the infrastructure required to serve your model as a REST API endpoint.

For this example, let’s assume you’ve already deployed a model named HDFC_High_price_prediction.

🔗 Example Use Case: Stock Price Prediction

Let’s say we’ve built a model to predict high prices for HDFC Bank stock based on daily trading data.

We now want to:

Send trading data (like open, high, low, close prices) to our deployed endpoint.
Receive a prediction for the stock’s future high price.

This enables us to make real-time predictions and integrate them into trading dashboards, alerting systems, or further analytics.

🐍 Calling the Databricks Model Serving Endpoint Using Python

Now comes the fun part: calling your deployed model endpoint using Python!

Below is a Python script you can run from:

A Databricks notebook
A local Python environment
An application server

Here’s how to do it:

import requests
import json

# Databricks endpoint URL
endpoint_url = "https://adb-131152523232571.21.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations"

# Your Databricks PAT token
token = " Databricks PAT token"

# Prepare the payload
payload = {
    "inputs": [
        {
            "Date": "2024-07-03",  
            "OPEN": 2000,
            "HIGH": 2079,
            "LOW": 1987,
            "CLOSE": 2075
        }
    ]
}

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

response = requests.post(endpoint_url, headers=headers, json=payload)

print("Status Code:", response.status_code)
print("Response:", response.json())

💡 How This Code Works

Let’s break it down:

endpoint_url → This is your Databricks Model Serving URL. You’ll find this in the Databricks UI under your deployed endpoint details.
token → This is your Databricks Personal Access Token (PAT). It’s crucial for authenticating API calls securely. Never share your PAT publicly.
payload → This JSON object represents your input data. It matches the format your model expects, e.g., columns for Date, OPEN, HIGH, LOW, and CLOSE prices.
headers → Standard HTTP headers, including the Authorization Bearer token and Content-Type.
requests.post() → This sends your data to the model’s endpoint and returns a prediction.
response.json() → Prints the model’s prediction result!

If everything is configured correctly, you’ll receive a JSON response containing your predicted value.

✅ Common Use Cases for Databricks Model Serving

Here are just a few real-world scenarios where Databricks Model Serving shines:

Financial institutions predicting stock prices or risk scores in real time.
Retail companies delivering personalized product recommendations to customers.
Healthcare providers forecasting patient outcomes or prioritizing triage.
Manufacturing industries performing predictive maintenance on equipment.
Energy companies optimizing grids or predicting power demands.

Databricks Model Serving makes it easy to turn machine learning into real-time business value.

🔒 Best Practices for Secure Model Serving

When deploying and consuming model endpoints:

✅ Always protect your tokens. Store them securely and never hard-code them in publicly visible code.

✅ Use versioning in Databricks MLflow Model Registry to manage updates and rollbacks safely.

✅ Monitor endpoint performance using Databricks’ built-in dashboards for latency, error rates, and cost management.

✅ Keep your input payloads clean and aligned with what your model expects to avoid errors.

🌟 Wrapping Up

Databricks Model Serving is a game changer for getting machine learning models into production quickly and reliably. Instead of wrestling with complex infrastructure, you can deploy your models with just a few clicks and call them from anywhere using Python.

In this blog, we’ve explored:

✅ Why model serving is crucial in modern ML workflows
✅ How Databricks simplifies deployment as an API
✅ How to invoke your model endpoint using Python

Whether you’re building models for financial forecasting, customer personalization, or predictive maintenance, Databricks Model Serving lets you bring your machine learning innovations to life in production.

Thank You,
Vivek Janakiraman

Posted by

Vivek Janakiraman

Posted on

March 24, 2025

Posted under

Azure Databricks

Comments

Azure Databricks Series: Hands-On Machine Learning for Stock Prediction

Introduction

In today’s data-driven world, machine learning (ML) plays a crucial role in predictive analytics. One of the most popular use cases is stock price prediction, where ML algorithms analyze historical data to forecast future trends. In this blog, we will explore how to leverage Azure Databricks for stock price prediction using machine learning.

📺 Watch the full tutorial video here

📂 Download the code file from here

Why Use Azure Databricks for Machine Learning?

Azure Databricks provides a powerful environment for big data processing, machine learning, and real-time analytics. Here are some key reasons why it is ideal for ML-based stock prediction:

Scalability – Handles large volumes of historical stock data efficiently.
Integration – Seamlessly connects with Azure Storage, Delta Lake, and MLflow.
Collaborative Environment – Supports teamwork with shared notebooks and version control.
High Performance – Optimized for distributed computing and deep learning workloads.

Workflow for Stock Price Prediction

The machine learning workflow in Azure Databricks for stock prediction involves multiple steps:

Step 1: Data Collection

Stock market data is gathered from a financial data provider. Typically, historical stock prices include:

Date
Open price
High price
Low price
Close price
Volume traded

Step 2: Data Preprocessing

To ensure accurate predictions, the raw data undergoes preprocessing:

Handling missing values
Normalizing stock prices
Converting date-time format
Feature engineering to extract trends and patterns

Step 3: Storing Data in a Delta Table

Azure Databricks supports Delta Lake, an optimized storage layer for big data analytics. The cleaned dataset is stored in a Delta Table, which ensures:

ACID transactions for data integrity
Scalability to handle large datasets
Versioning for better data management

Step 4: Model Selection & Training

For stock prediction, various machine learning models can be used, such as:

Linear Regression – Suitable for trend analysis.
Random Forest – Effective for capturing non-linear relationships.
LSTM (Long Short-Term Memory) – A deep learning model ideal for time-series forecasting.

The model is trained using historical data, and performance is evaluated using metrics like Mean Squared Error (MSE) and R-Squared.

Step 5: Predicting Future Stock Prices

Once the model is trained, it is used to predict next-day stock prices based on recent trends.

Step 6: Visualization & Insights

The predicted prices are visualized using interactive charts and graphs to compare with actual values. This helps in understanding the performance of the model and refining future predictions.

Challenges in Stock Price Prediction

While ML provides valuable insights, predicting stock prices has inherent challenges:

Market Volatility – Prices can fluctuate due to unforeseen events.
External Factors – News, political events, and investor sentiment impact prices.
Overfitting – Models may perform well on historical data but struggle with real-world scenarios.

Despite these challenges, machine learning helps traders and investors make informed decisions based on data-driven insights.

Conclusion

Azure Databricks provides a robust, scalable, and efficient platform for stock price prediction using machine learning. By leveraging its powerful data processing capabilities and ML frameworks, we can build accurate and insightful models to analyze stock trends.

📺 Watch the complete tutorial here

📂 Download the code file from here

Stay tuned for more Azure Databricks tutorials in this series! 🚀

JBs Wiki

Tag Archives: Databricks for Beginners

Azure Databricks Series: The Hidden Way to Optimize Costs – No One Talks About!

Why Focus on Cost Optimization?

What You’ll Learn in This Tutorial

1️⃣ Collect Cluster Configuration Data

2️⃣ Analyze with Power BI 📊

3️⃣ AI-Powered Recommendations 🤖

Download the Power BI Template

Pro Tips for Cost Savings

Final Thoughts

Azure Databricks Series: Deploying Custom Model with Model Serving & Python Integration Step by Step

🎯 Why Model Serving Matters

🧩 How Model Serving Fits into the Modern ML Workflow

🚀 Deploying Your Custom Model in Databricks

🔗 Example Use Case: Stock Price Prediction

🐍 Calling the Databricks Model Serving Endpoint Using Python

💡 How This Code Works

✅ Common Use Cases for Databricks Model Serving

🔒 Best Practices for Secure Model Serving

🌟 Wrapping Up

Azure Databricks Series: Hands-On Machine Learning for Stock Prediction

Introduction

Why Use Azure Databricks for Machine Learning?

Workflow for Stock Price Prediction

Step 1: Data Collection

Step 2: Data Preprocessing

Step 3: Storing Data in a Delta Table

Step 4: Model Selection & Training

Step 5: Predicting Future Stock Prices

Step 6: Visualization & Insights

Challenges in Stock Price Prediction

Conclusion