Watch this as a video on our you tube channel JBSWiki.
In the ever-evolving world of data and AI, one of the biggest challenges is bridging the gap between building a machine learning model and putting it into production so it can generate real business value.
Imagine this: you’ve spent weeks training a model to predict stock prices with great accuracy. It’s sitting in your Databricks workspace, looking perfect. But how do you actually use it in real-world applications to serve predictions in real time?
This is where Databricks Model Serving steps in to save the day.
In this blog, I’ll show you how to:
✅ Deploy a custom machine learning model as a serving endpoint in Databricks
✅ Understand why model serving is crucial in production environments
✅ Call your deployed model using Python code for real-time predictions
Let’s dive in!
🎯 Why Model Serving Matters
Training a machine learning model is only half the battle. In production environments, you often need:
- Real-time predictions for dynamic applications like stock price forecasting, fraud detection, or recommendation systems.
- A scalable, secure way to expose your model to other systems or applications.
- Low-latency responses without needing to run entire notebooks or pipelines every time you want a prediction.
Databricks Model Serving solves these challenges by turning your trained model into a REST API. This means you can easily integrate machine learning into your applications, dashboards, and workflows without reinventing the wheel.
🧩 How Model Serving Fits into the Modern ML Workflow
Here’s how Databricks Model Serving fits into the bigger picture:
- Data Collection & Storage — Gather raw data into Azure Data Lake Storage or other data lakes.
- Data Engineering & Transformation — Clean and prepare the data using Databricks notebooks and Delta Lake.
- Model Training & Experimentation — Train models with MLflow and notebooks.
- Model Registration — Save your best model versions into the MLflow Model Registry.
- Model Serving — Deploy the model as an endpoint using Databricks Model Serving.
- Prediction Consumption — Call the endpoint from Python, applications, dashboards, or other services.
In this blog, we’ll focus on steps 5 and 6: Model Serving and how to consume predictions.
🚀 Deploying Your Custom Model in Databricks
Before we can call our model from Python, we need to deploy it as a serving endpoint.
If you haven’t done this yet, here’s a quick high-level overview of the steps:
- Register Your Model in the MLflow Model Registry.
- Navigate to Model Serving in the Databricks UI.
- Select the model version you want to deploy.
- Choose the compute size for serving (small, medium, large, etc.).
- Click Deploy.
Databricks will handle all the heavy lifting, spinning up the infrastructure required to serve your model as a REST API endpoint.
For this example, let’s assume you’ve already deployed a model named HDFC_High_price_prediction.
🔗 Example Use Case: Stock Price Prediction
Let’s say we’ve built a model to predict high prices for HDFC Bank stock based on daily trading data.
We now want to:
- Send trading data (like open, high, low, close prices) to our deployed endpoint.
- Receive a prediction for the stock’s future high price.
This enables us to make real-time predictions and integrate them into trading dashboards, alerting systems, or further analytics.
🐍 Calling the Databricks Model Serving Endpoint Using Python
Now comes the fun part: calling your deployed model endpoint using Python!
Below is a Python script you can run from:
- A Databricks notebook
- A local Python environment
- An application server
Here’s how to do it:
import requests
import json
# Databricks endpoint URL
endpoint_url = "https://adb-131152523232571.21.azuredatabricks.net/serving-endpoints/HDFC_High_price_prediction/invocations"
# Your Databricks PAT token
token = " Databricks PAT token"
# Prepare the payload
payload = {
"inputs": [
{
"Date": "2024-07-03",
"OPEN": 2000,
"HIGH": 2079,
"LOW": 1987,
"CLOSE": 2075
}
]
}
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
response = requests.post(endpoint_url, headers=headers, json=payload)
print("Status Code:", response.status_code)
print("Response:", response.json())
💡 How This Code Works
Let’s break it down:
- endpoint_url → This is your Databricks Model Serving URL. You’ll find this in the Databricks UI under your deployed endpoint details.
- token → This is your Databricks Personal Access Token (PAT). It’s crucial for authenticating API calls securely. Never share your PAT publicly.
- payload → This JSON object represents your input data. It matches the format your model expects, e.g., columns for Date, OPEN, HIGH, LOW, and CLOSE prices.
- headers → Standard HTTP headers, including the Authorization Bearer token and Content-Type.
- requests.post() → This sends your data to the model’s endpoint and returns a prediction.
- response.json() → Prints the model’s prediction result!
If everything is configured correctly, you’ll receive a JSON response containing your predicted value.
✅ Common Use Cases for Databricks Model Serving
Here are just a few real-world scenarios where Databricks Model Serving shines:
- Financial institutions predicting stock prices or risk scores in real time.
- Retail companies delivering personalized product recommendations to customers.
- Healthcare providers forecasting patient outcomes or prioritizing triage.
- Manufacturing industries performing predictive maintenance on equipment.
- Energy companies optimizing grids or predicting power demands.
Databricks Model Serving makes it easy to turn machine learning into real-time business value.
🔒 Best Practices for Secure Model Serving
When deploying and consuming model endpoints:
✅ Always protect your tokens. Store them securely and never hard-code them in publicly visible code.
✅ Use versioning in Databricks MLflow Model Registry to manage updates and rollbacks safely.
✅ Monitor endpoint performance using Databricks’ built-in dashboards for latency, error rates, and cost management.
✅ Keep your input payloads clean and aligned with what your model expects to avoid errors.
🌟 Wrapping Up
Databricks Model Serving is a game changer for getting machine learning models into production quickly and reliably. Instead of wrestling with complex infrastructure, you can deploy your models with just a few clicks and call them from anywhere using Python.
In this blog, we’ve explored:
✅ Why model serving is crucial in modern ML workflows
✅ How Databricks simplifies deployment as an API
✅ How to invoke your model endpoint using Python
Whether you’re building models for financial forecasting, customer personalization, or predictive maintenance, Databricks Model Serving lets you bring your machine learning innovations to life in production.
Thank You,
Vivek Janakiraman
Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.