Posted by

Posted on

February 17, 2024

Posted under

Comments

Proactively Managing Transactional Replication Latency with SQL Server

Transactional replication is a critical component of many SQL Server environments, providing high availability, load balancing, and other essential benefits. However, managing replication latency, the delay between an action occurring on the publisher and it being reflected on the subscriber, is vital for ensuring system performance and data integrity. In this blog post, we’ll explore a proactive approach to monitor and alert on replication latency, helping database administrators (DBAs) maintain optimal system health.

The Issue:

Replication latency can sometimes go unnoticed until it impacts the system performance or data accuracy, leading to potential data loss or business disruptions. Traditional monitoring techniques may not provide real-time alerts or may require significant manual intervention, making them less effective for immediate latency identification and resolution.

The Script:

To address this challenge, we introduce a SQL script designed by Vivek Janakiraman from JBSWiki, specifically crafted to monitor transactional replication latency in SQL Server environments. This script efficiently posts tracer tokens to specified publications and measures the time taken for these tokens to move through the replication components, providing a clear picture of any latency present in the system.

/*
Author: Vivek Janakiraman
Company: JBSWiki
Description: This script is used to alert in case there is Transactional replication Log reader or distribution agent latency.
             It posts tracer tokens to specified publications and measures the latency to the distributor and subscriber.
*/

-- Switch to the publisher database to insert tracer tokens.
USE [Publisher_Database_Here]   -- Use your publisher database name here.
-- Insert tracer tokens into the specified publications.
EXEC sys.sp_posttracertoken @publication = 'Publication_Name' -- Change appropriate Publication that should be monitored.
EXEC sys.sp_posttracertoken @publication = 'Publication_Name1' -- Change appropriate Publication that should be monitored.
-- Wait for 5 minutes to allow the tokens to propagate.
WAITFOR DELAY '00:05:00'

-- Switch to the distribution database to query latency information.
USE distribution
;WITH LatestEntries AS (
       -- Select the latest entries for each publication and agent.
       SELECT publication_id, agent_id, MAX(publisher_commit) AS MaxDate       
       FROM MStracer_tokens t
       JOIN MStracer_history h ON t.tracer_id = h.parent_tracer_id
       GROUP BY publication_id, agent_id
)
-- Select latency information for the latest tokens.
SELECT c.name, t.publication_id, h.agent_id, t.publisher_commit,
       ISNULL(DATEDIFF(s,t.publisher_commit,t.distributor_commit), 299) as 'Time To Dist (sec)',
       ISNULL(DATEDIFF(s,t.distributor_commit,h.subscriber_commit), 299) as 'Time To Sub (sec)'
INTO #REPL_LATENCY
FROM MStracer_tokens t 
JOIN MStracer_history h ON t.tracer_id = h.parent_tracer_id
JOIN distribution.dbo.MSdistribution_agents c ON h.agent_id = c.id
JOIN LatestEntries le ON t.publication_id = le.publication_id AND h.agent_id = le.agent_id AND t.publisher_commit = le.MaxDate
ORDER BY t.publisher_commit DESC

-- Check if there is any latency beyond acceptable limits and select those records.
IF EXISTS (SELECT 1 FROM #REPL_LATENCY WHERE ([Time To Dist (sec)] > 30 OR [Time To Sub (sec)] > 30))
BEGIN
    SELECT name, publication_id, agent_id, publisher_commit, [Time To Dist (sec)], [Time To Sub (sec)]
    INTO #REPL_LATENCY_Email 
    FROM #REPL_LATENCY 
    WHERE ([Time To Dist (sec)] > 30 OR [Time To Sub (sec)] > 30)
END

-- Prepare the HTML body content for the email alert.
DECLARE @body_content NVARCHAR(MAX);
SET @body_content = N'
<style>
table.GeneratedTable {
  width: 100%;
  background-color: #D3D3D3;
  border-collapse: collapse;
  border-width: 2px;
  border-color: #A9A9A9;
  border-style: solid;
  color: #000000;
}
table.GeneratedTable td, table.GeneratedTable th {
  border-width: 2px;
  border-color: #A9A9A9;
  border-style: solid;
  padding: 3px;
}
table.GeneratedTable thead {
  background-color: #A9A9A9;
}
</style>
<table class="GeneratedTable">
  <thead>
    <tr>
         <th>name</th>
         <th>publication_id</th>
         <th>agent_id</th>
         <th>publisher_commit</th>
         <th>[Time To Dist (sec)]</th>
         <th>[Time To Sub (sec)]</th>
    </tr>
  </thead>
  <tbody>' +
CAST(
          (SELECT td = name, '',
                               td = publication_id, '',
                               td = agent_id, '',
                               td = publisher_commit, '',
                               td = [Time To Dist (sec)], '',
                               td = [Time To Sub (sec)], ''
        FROM [dbo].#REPL_LATENCY_Email
        FOR XML PATH('tr'), TYPE   
        ) AS NVARCHAR(MAX)
    ) +
  N'</tbody>
</table>';

-- Send an email alert if there is any latency issue found.
IF EXISTS (SELECT TOP 1 * FROM [dbo].#REPL_LATENCY_Email) 
BEGIN
    EXEC msdb.dbo.sp_send_dbmail @profile_name = 'JBSWIKI',
                                 @body = @body_content,
                                 @body_format = 'HTML',
                                 @recipients = 'jvivek2k1@yahoo.com',
                                 @subject = 'ALERT: Transactional Replication Latency Alert'; 
END

-- Cleanup temporary tables.
DROP TABLE #REPL_LATENCY
DROP TABLE #REPL_LATENCY_Email

The Solution:

The script works by first posting tracer tokens to the specified publications within the publisher database. It then waits for a predetermined amount of time (defaulted to 5 minutes in the script) to allow the tokens to propagate through the system. Following this, the script measures the latency to the distributor and subscriber, providing a detailed report of the time taken in each stage of the replication process.

This information is then used to generate an HTML-formatted email alert if the latency exceeds predefined thresholds (30 seconds in the provided script), allowing for immediate action to be taken. The use of HTML formatting in the email ensures that the information is presented in an easily digestible format, facilitating quick understanding and response by the DBA.

Conclusion:

Proactive monitoring and management of transactional replication latency are paramount for maintaining the health and performance of SQL Server environments. The script provided offers a straightforward and effective solution for DBAs to stay ahead of potential replication issues. By automating the process of latency detection and alerting, this approach not only saves valuable time but also helps in preventing the negative impact of replication latency on business operations.

Remember, while this script serves as a valuable tool in your monitoring arsenal, it’s also important to tailor the solution to your specific environment and requirements. Regularly reviewing and adjusting the latency thresholds and monitoring frequency will ensure you continue to get the most out of your replication setup.

For more tutorials and tips on SQL Server, including performance tuning and database management, be sure to check out our JBSWiki YouTube channel.

Thank You,
Vivek Janakiraman

Disclaimer:
The views expressed on this blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted by

Vivek Janakiraman

Posted on

May 6, 2023

Posted under

Core

Comments

Maximizing Efficiency: How SQL Server’s Accelerated Database Recovery Can Improve Your Workflow

Introduction

SQL Server Accelerated Database Recovery (ADR) is a feature introduced in SQL Server 2019 that enhances the database recovery process. ADR addresses the long-standing challenge in SQL Server of database recovery time increasing with the number of transactions and active connections to the database. ADR helps minimize downtime during database recovery and reduces the risk of data loss.

This article will provide an in-depth look at SQL Server Accelerated Database Recovery, including its background, challenges and opportunities, best practices, future outlook, and conclusion.

Traditionally, when SQL Server experiences a crash or an unexpected shutdown, the database goes through a recovery process that can take a long time to complete. The recovery process involves three phases: analysis, redo, and undo. During the analysis phase, SQL Server scans the transaction log to determine the point at which the database was last consistent. In the redo phase, SQL Server applies all the transactions that were committed after that point. Finally, in the undo phase, SQL Server rolls back any transactions that were not committed.

The time it takes to complete the recovery process depends on the size of the transaction log, the number of transactions, and the number of connections to the database. In some cases, the recovery process can take hours or even days to complete, causing significant downtime for the application and potentially resulting in data loss.

Explanation of database recovery
Whenever a SQL Server instance restarts, the databases that were online before the restart must undergo a recovery process. During the recovery process, SQL Server ensures that all committed transactions are written to disk and any uncommitted transactions are rolled back. This ensures that the database is brought back to a consistent state.

Overview of SQL Server Accelerated Database Recovery
SQL Server Accelerated Database Recovery is a feature introduced in SQL Server 2019 that improves database recovery times and availability. It accomplishes this by reducing the amount of log data that needs to be replayed during recovery and by allowing recovery from a checkpoint.

Benefits of SQL Server Accelerated Database Recovery
The primary benefits of SQL Server Accelerated Database Recovery are faster recovery times and improved availability. This feature significantly reduces the downtime associated with database recovery and allows organizations to restore their systems more quickly in the event of a failure.

Understanding Traditional Database Recovery

Traditional database recovery involves restoring the database from a backup and replaying all the transactions in the log since the last backup.

How traditional database recovery works
In traditional database recovery, SQL Server reads the transaction log to identify all transactions that were not committed at the time of the restart. SQL Server then rolls back all uncommitted transactions and applies all committed transactions to the database. This process can take a significant amount of time, depending on the size of the database and the number of transactions that were in progress at the time of the restart.

Limitations of traditional database recovery
Traditional database recovery has several limitations. It can take a long time to complete, especially for large databases. In addition, the recovery process can cause a significant amount of I/O activity, which can impact the performance of the server. Finally, if a failure occurs during the recovery process, the entire recovery process must be restarted.

How SQL Server Accelerated Database Recovery Works

Log Sequence Number (LSN) filtering
SQL Server Accelerated Database Recovery works by filtering out redundant transaction log records during the recovery process. This is accomplished through the use of Log Sequence Number (LSN) filtering, which is a feature that was introduced in SQL Server 2019.

When a database is in Accelerated Database Recovery mode, SQL Server maintains a version store, which is a collection of active and previous versions of data pages. Each version of a data page is identified by its LSN, which is a unique identifier assigned to each transaction log record.

During the recovery process, SQL Server filters out transaction log records that are already reflected in the version store. This means that only changes that occurred after the most recent checkpoint are replayed during recovery, which can significantly reduce the amount of time required for recovery.

Checkpoint process
Another key aspect of SQL Server Accelerated Database Recovery is the checkpoint process. Checkpoints are a mechanism used by SQL Server to write dirty data pages (i.e., data pages that have been modified but not yet written to disk) to disk. This helps to reduce the amount of work required during recovery, as it ensures that there is less dirty data to be written to disk when the recovery process begins.

With Accelerated Database Recovery, the checkpoint process is enhanced to include a special type of checkpoint called a Accelerated Database Recovery checkpoint. These checkpoints are optimized for use with Accelerated Database Recovery, and they help to ensure that the version store is properly maintained and that redundant transaction log records are filtered out during recovery.

Recovery with Accelerated Database Recovery enabled
When a database is in Accelerated Database Recovery mode, recovery is performed in a slightly different way than it is with traditional database recovery. Instead of replaying all transaction log records from the beginning of the log, SQL Server uses the version store to filter out redundant records and only replays the necessary changes.

This can result in significantly faster recovery times, particularly for large databases or databases with high transaction rates. In addition, because only necessary changes are replayed, there is minimal impact on workload during the recovery process.

Benefits of SQL Server Accelerated Database Recovery

Faster recovery times
One of the primary benefits of SQL Server Accelerated Database Recovery is faster recovery times. By filtering out redundant transaction log records and replaying only necessary changes, SQL Server can significantly reduce the amount of time required to recover a database.

This is particularly beneficial for large databases or databases with high transaction rates, as traditional database recovery can take a significant amount of time in these scenarios. With Accelerated Database Recovery, recovery times can be reduced from hours or even days to minutes.

Improved availability
Another benefit of SQL Server Accelerated Database Recovery is improved availability. Because recovery times are significantly reduced, databases can be back up and running more quickly after a failure.

This can help to minimize downtime and ensure that critical business processes are not impacted by database failures. In addition, because only necessary changes are replayed during recovery, there is minimal impact on workload during the recovery process, which further improves availability.

Minimal impact on workload
With traditional database recovery, there is a significant impact on workload during the recovery process. This is because all transaction log records must be replayed from the beginning of the log, which can result in significant resource usage.

With SQL Server Accelerated Database Recovery, only necessary changes are replayed during recovery, which significantly reduces the impact on workload. This means that business processes can continue to operate normally during the recovery process, which is particularly important for mission-critical applications.

Implementation of SQL Server Accelerated Database Recovery

Implementing SQL Server Accelerated Database Recovery is a straightforward process, but there are a few requirements to keep in mind.

Compatibility requirements
First, your SQL Server instance must be running on at least SQL Server 2019 Enterprise Edition or Azure SQL Database. Additionally, your database must be running in the compatibility level 150 or higher.

Enabling Accelerated Database Recovery
To enable Accelerated Database Recovery for a specific database, use the following T-SQL command:

ALTER DATABASE [DatabaseName] SET ACCELERATED_DATABASE_RECOVERY = ON;

Once enabled, Accelerated Database Recovery is applied to all operations performed against the database. This includes all transactional operations, such as inserts, updates, and deletes, as well as DDL operations, such as table creation or index rebuilding.

Monitoring Accelerated Database Recovery
SQL Server provides several mechanisms for monitoring the performance of Accelerated Database Recovery.

One useful tool is the sys.dm_tran_persistent_version_store_stats dynamic management view. This view provides detailed statistics on the size and utilization of the version store for a specific database, as well as information about any background cleanup processes that may be running.

Additionally, SQL Server Management Studio provides a graphical view of the version store in the form of a new tab in the database properties window. This tab shows real-time statistics on the version store size and utilization, as well as the total number of versions and the oldest active transaction.

Limitations of SQL Server Accelerated Database Recovery

While Accelerated Database Recovery provides many benefits, there are also a few limitations to keep in mind.

Unsupported database features
Not all database features are supported with Accelerated Database Recovery. For example, databases that use memory-optimized tables, table partitioning, or stretch database are not currently supported.

Increased disk space usage
Accelerated Database Recovery can result in increased disk space usage due to the version store, which stores multiple versions of each modified page. This increased disk space usage may require additional planning and monitoring for large databases with high transactional volumes.

Potential performance impact
In rare cases, Accelerated Database Recovery may cause a performance impact due to increased I/O operations required for log processing. However, this impact is typically minimal and is outweighed by the benefits of faster recovery times and improved availability.

Conclusion

SQL Server Accelerated Database Recovery provides a powerful new feature for improving database recovery times and reducing downtime. By leveraging innovative technology such as Log Sequence Number (LSN) filtering and persistent versioning, Accelerated Database Recovery enables faster and more reliable database recovery with minimal impact on workload.

While there are a few limitations to keep in mind, such as unsupported database features and increased disk space usage, the benefits of Accelerated Database Recovery far outweigh the potential drawbacks. If you’re running SQL Server 2019 Enterprise Edition or Azure SQL Database, consider enabling Accelerated Database Recovery to take advantage of its powerful benefits and improve your database’s availability and performance.

Reference : https://learn.microsoft.com/en-us/sql/relational-databases/accelerated-database-recovery-concepts?view=sql-server-ver16