SQL Server 2022 introduces several new features, one of the most exciting being the APPROX_PERCENTILE_CONT function. This function allows for efficient and approximate calculation of percentiles in large datasets, which can be particularly useful for analytics and data-driven decision-making. In this blog, we will explore the APPROX_PERCENTILE_CONT function in detail, using the JBDB database for practical demonstrations. We’ll start with a business use case, dive into the function’s capabilities, and provide a range of T-SQL queries for you to try. Let’s get started! ๐
Business Use Case: Customer Transaction Analysis ๐ผ
Consider a retail company that wants to analyze customer spending behavior. The company has a vast amount of transaction data stored in the JBDB database. To optimize marketing strategies and tailor promotions, they want to identify spending patterns across different customer segments.
For example, the company might want to know the 90th percentile of spending amounts to target high-value customers with exclusive offers. Calculating this percentile accurately in a large dataset can be resource-intensive. The APPROX_PERCENTILE_CONT function offers a solution by providing an approximate, yet efficient, calculation of percentiles.
Understanding the APPROX_PERCENTILE_CONT Function ๐
The APPROX_PERCENTILE_CONT function is designed to compute approximate percentile values for a set of data. This function is particularly useful when dealing with large datasets, as it offers a performance advantage by using approximate algorithms.
Syntax:
APPROX_PERCENTILE_CONT ( percentile ) WITHIN GROUP ( ORDER BY numeric_expression )
percentile: A value between 0 and 1 that specifies the desired percentile.numeric_expression: The column or expression to calculate the percentile on.
Example 1: Basic Usage ๐
Let’s calculate the 90th percentile of customer transaction amounts.
Setup:
USE JBDB;
GO
CREATE TABLE CustomerTransactions (
TransactionID INT PRIMARY KEY,
CustomerID INT,
TransactionAmount DECIMAL(18, 2),
TransactionDate DATE
);
INSERT INTO CustomerTransactions (TransactionID, CustomerID, TransactionAmount, TransactionDate)
VALUES
(1, 101, 50.00, '2023-01-15'),
(2, 102, 150.00, '2023-01-16'),
(3, 103, 300.00, '2023-01-17'),
(4, 101, 75.00, '2023-01-18'),
(5, 104, 200.00, '2023-01-19'),
(6, 105, 125.00, '2023-01-20'),
(7, 106, 400.00, '2023-01-21'),
(8, 102, 175.00, '2023-01-22');
GO
Query to Calculate 90th Percentile:
SELECT APPROX_PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx90thPercentile
FROM CustomerTransactions;

This result indicates that 90% of transactions are below $375. This insight can help the company focus on high-value customers who spend above this threshold.
Example 2: Analyzing Different Percentiles ๐
Let’s calculate different percentiles to understand the distribution of transaction amounts.
Query to Calculate Multiple Percentiles:
SELECT
APPROX_PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx25thPercentile,
APPROX_PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx50thPercentile,
APPROX_PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx75thPercentile,
APPROX_PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx90thPercentile
FROM CustomerTransactions;

These results provide a clear view of the transaction distribution, helping the company to tailor marketing strategies for different customer segments.
Comparing Percentile Results:
- Compare approximate and exact percentile calculations for the 90th percentile:
SELECT
APPROX_PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY TransactionAmount) AS Approx90thPercentile,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY TransactionAmount) OVER () AS Exact90thPercentile
FROM CustomerTransactions
group by TransactionAmount;
Segmenting Customers by Spending:
- Identify customers whose spending is in the top 10%:
SELECT CustomerID, TransactionAmount
FROM CustomerTransactions
WHERE TransactionAmount >= (SELECT APPROX_PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY TransactionAmount)
FROM CustomerTransactions);
Analyzing Spending Patterns Over Time:
- Calculate monthly spending percentiles to identify trends:
SELECT
DATEPART(MONTH, TransactionDate) AS Month,
APPROX_PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY TransactionAmount) AS MedianTransaction
FROM CustomerTransactions
GROUP BY DATEPART(MONTH, TransactionDate)
ORDER BY Month;
Combining Percentiles with Other Aggregations:
- Find the average transaction amount for each percentile group:
SELECT
PercentileGroup,
AVG(TransactionAmount) AS AvgTransactionAmount
FROM (
SELECT
TransactionAmount,
NTILE(4) OVER (ORDER BY TransactionAmount) AS PercentileGroup
FROM CustomerTransactions
) AS SubQuery
GROUP BY PercentileGroup;
Conclusion ๐
The APPROX_PERCENTILE_CONT function in SQL Server 2022 is a powerful tool for efficiently computing approximate percentiles in large datasets. By using this function, businesses can gain valuable insights into data distributions and make informed decisions based on these insights. Whether you’re analyzing customer spending, sales trends, or any other data, the APPROX_PERCENTILE_CONT function offers a quick and efficient way to understand your data.
Happy querying! ๐
For more tutorials and tips on SQL Server, including performance tuning and database management, be sure to check out ourย JBSWiki YouTube channel.
Thank You,
Vivek Janakiraman
Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided โAS ISโ with no warranties, and confers no rights.