SQL Server 2022: Unleashing the Power of the GENERATE_SERIES Function

In SQL Server 2022, the introduction of the GENERATE_SERIES function marks a significant enhancement, empowering developers and analysts with a flexible and efficient way to generate sequences of numbers. This feature, akin to similar functions in other database systems, simplifies tasks involving sequence generation, such as creating time series data, generating test data, and more.

In this blog, we’ll explore the GENERATE_SERIES function in detail, using the JBDB database to demonstrate its capabilities. We’ll start with a practical business use case, followed by a comprehensive guide on how to use the function. Let’s dive in! ๐ŸŒŸ

Business Use Case: Sales Forecasting ๐Ÿ“ˆ

Imagine you are working for a retail company, and your task is to generate a sales forecast for the next year. You have historical sales data and need to project future sales based on trends. A crucial step in this process is to create a series of dates representing each day of the next year, which will serve as the basis for the forecast.

The GENERATE_SERIES function can be a game-changer here, allowing you to quickly generate a range of dates without resorting to complex loops or recursive queries.

Introducing the GENERATE_SERIES Function ๐Ÿ› ๏ธ

The GENERATE_SERIES function generates a series of numbers or dates. Its syntax is straightforward:

GENERATE_SERIES(start, stop, step)
  • start: The starting value of the sequence.
  • stop: The ending value of the sequence.
  • step: The increment value between each number in the series.

Let’s see this in action with some practical examples!

Example 1: Basic Numeric Series ๐Ÿ”ข

To generate a series of numbers from 1 to 10:

SELECT value
FROM GENERATE_SERIES(1, 10, 1);

Example 2: Date Series for Forecasting ๐Ÿ“…

To generate a series of dates for each day of the next year, starting from January 1, 2023:

SELECT CAST(value AS DATE) AS ForecastDate
FROM GENERATE_SERIES('2023-01-01', '2023-12-31', 1);

Generating a Series of Dates Using a CTE ๐Ÿ“…

Since GENERATE_SERIES supports numeric sequences only, we use a recursive CTE to generate a series of dates. Hereโ€™s how to create a series of dates for the year 2023:

-- Create a recursive CTE to generate a series of dates
WITH DateSeries AS (
    -- Anchor member: start date
    SELECT CAST('2023-01-01' AS DATE) AS ForecastDate
    UNION ALL
    -- Recursive member: add one day to the previous date
    SELECT DATEADD(DAY, 1, ForecastDate)
    FROM DateSeries
    WHERE ForecastDate < '2023-12-31'
)
-- Query to select the generated dates
SELECT ForecastDate
FROM DateSeries
OPTION (MAXRECURSION 0); -- Remove recursion limit

Implementing the Use Case: Sales Forecasting ๐Ÿ“Š

Let’s apply the GENERATE_SERIES function to our sales forecasting scenario. Suppose we have a table Sales in the JBDB database with historical sales data. Our goal is to project future sales for each day of the next year.

Step 1: Creating the JBDB and Sales Table ๐Ÿ—๏ธ

First, we create the JBDB database and the Sales table:

CREATE DATABASE JBDB;
GO

USE JBDB;
GO

CREATE TABLE Sales (
    SaleDate DATE,
    Amount DECIMAL(10, 2)
);

Step 2: Inserting Historical Data ๐Ÿ“ฅ

Next, let’s insert some historical data into the Sales table:

INSERT INTO Sales (SaleDate, Amount)
VALUES
('2022-01-01', 100.00),
('2022-01-02', 150.00),
('2022-01-03', 200.00),
-- Additional data...
('2022-12-31', 250.00);

Step 3: Generating Future Dates and Forecasting ๐Ÿ“…๐Ÿ”ฎ

Now, we use GENERATE_SERIES to generate future dates and join it with our historical data to create a sales forecast:

-- Generate a series of future dates
WITH DateSeries AS (
    SELECT CAST('2023-01-01' AS DATE) AS ForecastDate
    UNION ALL
    SELECT DATEADD(DAY, 1, ForecastDate)
    FROM DateSeries
    WHERE ForecastDate < '2023-12-31'
),
-- Combine with historical sales data
SalesForecast AS (
    SELECT
        f.ForecastDate,
        ISNULL(s.Amount, 0) AS HistoricalAmount
    FROM
        DateSeries f
        LEFT JOIN Sales s ON f.ForecastDate = s.SaleDate
)
-- Project future sales
SELECT
    ForecastDate,
    HistoricalAmount,
    -- Simple projection logic (for demonstration)
    HistoricalAmount * 1.05 AS ProjectedAmount
FROM SalesForecast
OPTION (MAXRECURSION 0); -- Remove recursion limit

In this query:

  • We generate a series of dates for the year 2023 using GENERATE_SERIES.
  • We join these dates with the historical sales data to create a comprehensive sales forecast.
  • A simple projection logic is applied, assuming a 5% increase in sales.

Generate a Series of Numbers with Custom Step Size

Generate a sequence of numbers from 1 to 50 with a step size of 5:

-- Generate a sequence of numbers with a custom step size
SELECT value
FROM GENERATE_SERIES(1, 50, 5);

Generate a Series of Dates with Custom Step Size

Generate a series of dates from today to 30 days into the future with a step size of 5 days:

-- Generate a series of dates with a custom step size (5 days)
WITH DateSeries AS (
    SELECT DATEADD(DAY, value * 5, CAST(GETDATE() AS DATE)) AS ForecastDate
    FROM GENERATE_SERIES(0, 6, 1) -- 0 to 6 will generate 7 dates
)
SELECT ForecastDate
FROM DateSeries;

Generate a Series of Random Numbers

Generate a series of random numbers between 1 and 100:

-- Generate a series of random numbers between 1 and 100
SELECT ABS(CHECKSUM(NEWID())) % 100 + 1 AS RandomNumber
FROM GENERATE_SERIES(1, 10, 1); -- Generate 10 random numbers

Generate a Series of Time Intervals

Generate a series of time intervals (every 15 minutes) for one hour:

-- Generate a series of time intervals (15 minutes) for one hour
WITH TimeSeries AS (
    SELECT DATEADD(MINUTE, value * 15, CAST('2024-01-01 00:00:00' AS DATETIME)) AS TimeStamp
    FROM GENERATE_SERIES(0, 3, 1) -- 0 to 3 will generate 4 intervals
)
SELECT TimeStamp
FROM TimeSeries;

Generate a Series of Sequential IDs

Generate a series of sequential IDs from 1001 to 1010:

-- Generate a sequence of sequential IDs
SELECT value + 1000 AS SequentialID
FROM GENERATE_SERIES(1, 10, 1);

Generate a Series of Numeric Values with Non-Uniform Steps

Generate a series of numbers with varying steps (e.g., 1, 2, 4, 8, …):

-- Generate a series of numbers with varying steps (powers of 2)
WITH NumberSeries AS (
    SELECT 1 AS value
    UNION ALL
    SELECT value * 2
    FROM NumberSeries
    WHERE value < 64
)
SELECT value
FROM NumberSeries
OPTION (MAXRECURSION 0);

Generate a Series of Dates with Monthly Intervals

Generate a series of dates with a monthly interval for one year:

-- Generate a series of dates with monthly intervals for one year
WITH MonthSeries AS (
    SELECT DATEADD(MONTH, value, CAST('2024-01-01' AS DATE)) AS MonthStart
    FROM GENERATE_SERIES(0, 11, 1) -- 0 to 11 will generate 12 months
)
SELECT MonthStart
FROM MonthSeries;

Generate a Series of Numbers and Calculate Cumulative Sum

Generate a series of numbers and calculate their cumulative sum:

-- Generate a series of numbers and calculate the cumulative sum
WITH NumberSeries AS (
    SELECT value
    FROM GENERATE_SERIES(1, 10, 1)
),
CumulativeSum AS (
    SELECT
        value,
        SUM(value) OVER (ORDER BY value) AS CumulativeSum
    FROM NumberSeries
)
SELECT value, CumulativeSum
FROM CumulativeSum;

Generate a Series of Custom Random Dates

Generate a series of random dates within a specific range:

— Generate a series of random dates within a specific range
WITH RandomDates AS (
SELECT DATEADD(DAY, ABS(CHECKSUM(NEWID())) % 365, CAST(‘2024-01-01’ AS DATE)) AS RandomDate
FROM GENERATE_SERIES(1, 10, 1) — Generate 10 random dates
)
SELECT RandomDate
FROM RandomDates;

Generate a Series of Numbers and Create Custom Labels

Generate a series of numbers and create custom labels:

— Generate a series of numbers and create custom labels
SELECT value AS Number, ‘Label_’ + CAST(value AS VARCHAR(10)) AS CustomLabel
FROM GENERATE_SERIES(1, 10, 1);

Conclusion ๐ŸŒŸ

The GENERATE_SERIES function in SQL Server 2022 is a versatile tool that can significantly simplify the generation of sequences, whether for numeric ranges or date series. Its applications range from creating time series data for analytics to generating test data for development and testing purposes.

By leveraging GENERATE_SERIES, businesses can streamline their data workflows, enhance forecasting accuracy, and improve decision-making processes. Whether you’re a database administrator, developer, or data analyst, this function is a valuable addition to your SQL toolkit.

Feel free to experiment with GENERATE_SERIES and explore its potential in your projects! ๐ŸŽ‰

For more tutorials and tips on SQL Server, including performance tuning and database management, be sure to check out our JBSWiki YouTube channel.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided โ€œAS ISโ€ with no warranties, and confers no rights.

Exploring SQL Server 2022 Data Virtualization with PolyBase

SQL Server 2022 introduces enhanced data virtualization capabilities with PolyBase, allowing you to query external data sources seamlessly. In this blog, weโ€™ll dive into the key features of PolyBase, including how to use it to query external data sources like Hadoop and Cosmos DB. Weโ€™ll provide implementation steps and examples to help you get started. Letโ€™s unlock the power of data virtualization! ๐Ÿ”“

What is PolyBase? ๐Ÿค”

PolyBase is a data virtualization feature in SQL Server that allows you to query data from external sources using T-SQL. This means you can access and integrate data from Hadoop, Cosmos DB, and other sources without moving the data. PolyBase simplifies data integration and minimizes the need for ETL processes.

Key Features of PolyBase in SQL Server 2022 ๐ŸŒŸ

  1. Support for S3-Compatible Object Storage: Query data stored in S3-compatible object storage using the S3 REST API.
  2. Enhanced File Format Support: Query data from CSV, Parquet, and Delta files.
  3. Improved Performance: Optimized for better performance and scalability.

Querying External Data Sources with PolyBase ๐ŸŒ

Letโ€™s explore how to use PolyBase to query data from Hadoop and Cosmos DB.

Querying Hadoop Data ๐Ÿž๏ธ

Step 1: Install PolyBase Services Ensure that PolyBase services are installed and running on your SQL Server instance.

Step 2: Create an External Data Source Create an external data source to connect to your Hadoop cluster.

CREATE EXTERNAL DATA SOURCE HadoopDataSource
WITH (
    TYPE = HADOOP,
    LOCATION = 'hdfs://your-hadoop-cluster:8020',
    CREDENTIAL = HadoopCredential
);
GO

Step 3: Create an External Table Create an external table to query data from Hadoop.

CREATE EXTERNAL TABLE HadoopTable (
    ID INT,
    Name NVARCHAR(50),
    Age INT
)
WITH (
    LOCATION = '/path/to/hadoop/data',
    DATA_SOURCE = HadoopDataSource,
    FILE_FORMAT = HadoopFileFormat
);
GO

Step 4: Query the External Table Query the external table as if it were a local table.

SELECT * FROM HadoopTable;
GO
Querying Cosmos DB Data ๐ŸŒŒ

Step 1: Install PolyBase Services Ensure that PolyBase services are installed and running on your SQL Server instance.

Step 2: Create an External Data Source Create an external data source to connect to your Cosmos DB.

CREATE EXTERNAL DATA SOURCE CosmosDBDataSource
WITH (
    TYPE = COSMOSDB,
    LOCATION = 'https://your-cosmosdb-account.documents.azure.com:443/',
    CREDENTIAL = CosmosDBCredential
);
GO

Step 3: Create an External Table Create an external table to query data from Cosmos DB.

CREATE EXTERNAL TABLE CosmosDBTable (
    ID NVARCHAR(50),
    Name NVARCHAR(50),
    Age INT
)
WITH (
    LOCATION = 'dbs/your-database/colls/your-collection',
    DATA_SOURCE = CosmosDBDataSource
);
GO

Step 4: Query the External Table Query the external table as if it were a local table.

SELECT * FROM CosmosDBTable;
GO

Conclusion ๐Ÿ“

SQL Server 2022 with PolyBase offers powerful data virtualization capabilities, enabling you to query external data sources like Hadoop and Cosmos DB seamlessly. By following the implementation steps and examples provided, you can integrate and query external data efficiently. Start leveraging PolyBase today to unlock the full potential of your data! ๐Ÿš€

Feel free to reach out if you have any questions or need further assistance. Happy querying! ๐Ÿ˜Š

For more tutorials and tips on SQL Server, including performance tuning and database management, be sure to check out our JBSWiki YouTube channel.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided โ€œAS ISโ€ with no warranties, and confers no rights.